Lower Bounds for the Expected Sample Sise and @ Moen Pe Risk of . 

a 
Sequential Procedure ‘aserLy Homvypnre 383 
Sampling Inepection ss s Miniouum Lons Problem _ 


SL Yor Da Wa 
Estimation From Censored Normal Samples. .. J. Dexon $85 


Estimating the Mean of a Finite Population 
: J. Ror amp 1. M. Cuumaaviser 908 


On a result by M. Rosenblatt concerning the Von 
Se ee ee 
An Asymptotic Minimax 


Proof of the AEP of Information Theory... As J. Teomamcan 482 
Random Matrices H. ann H. Kueren 457 


woke pow Ad, Seaminan te 
Application of Storage Theory to Queues Poisson Arrivals 

= N. U. Paasav 475 
A One-sided Analog of Koimogorov’s Inequality... Arazer W. Manemata, 483 


aS 


Vol. $1, No. 2 — June, 1960. 





THE ANNALS 
OF MATHEMATICAL STATISTICS 


Rates. Currest iavues sre $12. por volume, (our iemves of one calendar 


solidus; j thus to % 0)/(6 + 4) rether then ~~). 


Authors will ordinarily receive only galley proofs. Fifty reprints without. covers will be 
furnished free. Additional reprints and covers will be furnivhe‘ a cost. 


to the A eres Sse et wre eh hee 6 Te ae 
above. net be addressed to 


ComPosmD AND PRINTED aT THE 
WAVERLY PRESS, Inc., Bautiwonz, Manian, U. 8. A. 
Becond-clase postage paid at Baltimore, Maryland 





EDITORIAL STAFF 


Epitor 
WILLIAM KRUSKAL 
AssociaTe Epitors 
ALLAN BIRNBAUM DONALD A. DARLING OSCAR KEMPTHORNE 
Z. W. BIRNBAUM WASSILY HOEFFDING E. L. LEHMANN 
DOUGLAS G. CHAPMAN N. L. JOHNSON DAVID L. WALLACE 
W. S. CONNER 
WITH THE COOPERATION OF 


J. F. Darr Harry Kesten J. W. Pratr 
Crrus Derman C. H. Krartr * Howarp Rairra 
J. L. Doos Sotomon Kuuiusacz H. E. Rossins 
Meyer Dwass Evoene Luxacs Water L. Sut 
D. A. 8. Fraser G. E. Norruer Lions. Weiss 
Samvuet Karin INGRAM OLKIN 


Past Epirors or THe ANNALS 


H. C. Carver, 1930-1938 T. W. Anpgerson, 1950-1952 
8. 8S. Wiixs, 1938-1949 E. L. Leumann, 1953-1955 
T. E. Harris, 1955-1958 


Published quarterly by the Institute of Mathematical Statistics in March, 
June, September and December. 


IMS INSTITUTIONAL MEMBERS 


ABERDEEN Provine Grounps (Bauustic Researcnu LaBporatories), Aberdeen, Maryland 
ArrosetT-GENERAL Corporation, P. O. Box 206, Azusa, California 


American Viscose Corporation, Marcus Hook, Pennsylvania 


Beit Teceruone Lanoratorigs, Inc., Tecunicat Liprarry, 463 West Street, New York 14, 
New York 

Benprx AviATION CorPoraTION, 1200 Fisher Bldg., Detroit, Michigan 

Bog1nc ArrPLaANge Company, Box 3707, Seattle, sshington 

Cauirornia Researcn Corporation, P. O. Box 1627 hmond, California 

Coiumsia University, New York 27, N. Y. 

Cornett Untversity, Matuematics DeparTMENT, Ithaca, New York 

Generat ANALYsIs CorPoraTION, 11753 Wilshire Bivd., Los Angeles 25, California 

InpIANA University, THE Liprary, Bloomington, Indiana 

INTERNATIONAL Business MACHINES CorporaTion, MATHEMATICS AND APPLIED SclENCE 
Liprary, 1271 Avenue of the Americas, New York 20, N. Y. 

Iowa STaTE UNIVERSITY, STATISTICAL LABORATORY, Ames, Iowa 

LocxHeEep Arrckart CorPorRATION, ENGINEERING Liprary, Burbank, California 

Micuican State University, DerartTMent or Statistics, East Lansing, Michigan 

MINNESOTA MINING AND MANUFACTURING CoMPANY, APPLIED MATHEMATICS AND StTa- 
tistics, St. Peul, Minnesota 

MonSsANTO ‘CHEMICAL Company, 800 North Lindbergh Blvd., St. Louis 66, Missouri 

Nationa Cash Recister Company, Researcn Department, Main and K Streets, Day- 
ton 9, Ohio 

Natrona Security Acrncy, Fort George G. Meade, Maryland 

NorTHWESTERN University, DePparTMENT oF Matuematics, Evanston, Illinois 

Princeton University, Department or Matnematics, Secrion or MATHEMATICAL 
Sratistics, Princeton, New Jersey 

Purpvse University Liprariges, Lafayette, Indiana 

Rapio CorporaTIon or America, R.C.A. LABORATORIES Liprary, Princeton, New Jersey 

RemInGcTON Ranp—Untvac DIvision, 315 Park Avenue South, New York 10, N. Y. 

Sanpia Corporation, Sandia Base, Albuquerque, New Mexico 

SouTHERN METHODIST University Matuematics Department, Dallas 5, Texas 

Space TecuNno.ocy Lasoratories, P. O. Box 95001, Los Angeles 45, California 

Stanrorp University, Girsnick Memoria Lise.*:y, § ord, California 

Strate University or Iowa, Iowa City, Iowa 


(Continued on next page) 





Tue Atiantic Rerininc Company, 2700 Passyunk Avenue, Philadelphia, Pa. 

Tue Caruouic UNiversity or America, STaTistTicaL LABORATORY, MATHEMATICS DEPART- 
MENT, Washington, D. C. 

Tue Ramo-Woo.ripce Corporation, Los Angeles, California 

Union Canpipe Corporation, 30 East 42nd Street, New York 17, New York 

Unton Or, Company or Catirornia, Unton Researcu Center, Box 76, Brea, California 

Unrtep States Stee. Corporation Liprary, Monroeville, Penna. 

University or Cauirornia, Statistica, Lasoratory, Berkeley, California 

University or Ituinois, Sertats Department, Urbana, Illinois 

University or Norts Carouina, DerarTMENT oF Statistics, Chapel Hill, North Carolina 

University or Puerto Rico, ScuHoo. or Tropica Mepicine, San Juan, Puerto Rico 

University or WasHinoeton, LABORATORY OF STaTisTIcaAL Researcu, Seattle, Washington 

W. R. Grace anv Company, Researcu Division, Washington Research Center, Clarks- 
ville, Maryland 

W. R. Grace anp Compan’, Dewey anv ALmy Cuemicat Diviston, 62 Whittemore Avenue, 
Cambridge 40, Massachusetts 





THE STATISTICAL WORK OF DAVID VAN DANTZIG (1900-1959) 


By J. HEMELRUK 
Technological University of Delft; Mathematical Centre, Amsterdam 


1. Introduction. The death of David van Dantzig on July 22nd 1959 bereaved 
Holland of its foremost mathematical statistician and put an untimely end to 
his work. His importance for the development of research in and application of 
probability and statistics in Holland has been large; we lose not only an out- 
standing mathematician but also a pioneer. A short outline of his life and of his 
statistical work is given in the following sections. One topic is discussed in some 
detail: an application of his theory of “collective marks” to rank correlation. The 
theory of collective marks is perhaps his most interesting and most promising 
contribution to the theory of probability and statistics. The example mentioned 
has as yet only been published in mimeographed form [34]; it is a nice demon- 
stration of the power of the method used and it may show the way to further 
applications. 

A bibliography of probabilistic and statistical papers is given; a bibliography 
of all van Dantzig’s writings can be found at the end of [40] and a more extensive 
discussion of van Dantzig’s statistical work in [41]. 


2. His life. van Dantzig started his mathematical career as a pure mathe- 
matician: his main subjects before the war were topology, differential geom- 


etry, philosophy (in particular significs—‘‘significa’-and the foundations of 
mathematics and physics) and mathematical physics. From 1932 onward he 
lectured at the Technical University at Delft, first as a Lecturer and later (1938) 
as Professor. 

During the war he was discharged by the Germans and studied probability 
and statistics. Holland was at that moment decidedly an underdeveloped coun- 
try as far as these subjects were concerned and he put himself to the task of 
changing this. He emerged from the war as an outstanding statistician and was 
at once given a central place as such at the municipal University of Amsterdam 
as Professor in the Theory of Collective Phenomena. Together with Prof. J. G. 
Van der Corput and Prof. J. F. Koksma he founded the “Mathematical Centre” 
in Amsterdam, an institution, subsidised by government and industry, which 
unites all branches of pure and applied mathematics in one organisation. As 
head of the Department of Mathematical Statistics he was able to stimulate 
research and consultation to such a degree that general recognition nationally 
and internationally soon followed. He was appointed Fellow of the Institute of 
Mathematical Statistics, the American Statistical Association and the Royal 
Statistical Society. He was a member of the International Statistical Institute 
and, in Holland, of the Royal Academy of Sciences (Koninklijke Akademie van 
Wetenschappen) and, of course, an outstanding member of the Dutch Statistical 
Association (Vereniging voor Statistiek). He was Visiting Professor at the Uni- 

Received February 5, 1960. 

269 





270 J. HEMELRIJK 


versity of California (Berkeley) in 1951 and worked for some time at the National 
Bureau of Standards in Washington, D. C. 

His impact on statistics in Holland was formidable. He instilled his pupils with 
his sharp and critical way of thinking, which is manifest in his mathematical 
work as well as in his views on foundations and consultation. As an objectivist 
and frequentist he was apt to be sharp in his criticism of other views, [31], [33], 
and in consultation he adhered strictly to high standards of exactness arid 
precision. 

During the last years of his life he supervised a group of mathematicians in 
the departments of Mathematical Statistics and of Applied Mathematics at the 
Mathematical Centre, who worked on the problem of the high water levels of 
the North Sea. This work was initiated by the Dutch government’s “Delta 
Commission” following the disastrous flood of 1953. Partial publications on this 
subject are [25], [29], [35] and [36]. A final version of a report of several hundreds 
of pages was lying on his desk on the day of his death. He had been putting final 
touches to the manuscript, which will be published as a book, one volume of the 
report of the Delta Commission to the government. 


3. His theory of collective marks. The theory of collective marks has been 
extensively described and applied in {4}, {5}, [22], [27], [34] and [37]. Here only the 
fundamental ideas will be given and one example of their application. For sim- 
plicity, we only consider the case of a discrete random variable assuming the 
values 


(1) 1,22, +--+ with probabilities p: , p: , --- (20 ps = 2). 


Generalisation to other cases is straightforward, although sometimes compli- 
cated. 


The collective mark of the probability distribution is defined by 


(2) C def 2) pA; 5! 


where the A; (¢ = 1, 2, --- ) and C are abstract symbols, for which all kinds of 
substitutions can be considered. 

If E is an arbitrary possible event and P(E | z,) is the conditional probability, 
given z;, that E will not occur, then substitution of P(E | x,) for A; leads to 


(3) 2 pm PE | 2) = PCB); 


i.e. this substitution transforms C into P(£).2 This simple property is funda- 
mental for the application of the theory. 

Another interesting substitution is A‘ for A;, where the As. are numbers of 
some kind, giving 


(4) C(A) def >> p,A*. 
— T 
' def indicates that the left hand member is defined by the right hand one. 


* The use of £ instead of E in this formula turns out later to be convenient for applica- 
tion, so it is introduced from the start. 





DAVID VAN DANTZIG 271 


The p, are uniquely determined if C(A) is known. Taking A as a complex 
number this is the generating function of the p; ; with A; = exp rz, (2) yields 
the moment generating function and with A; = exp [itz,| (real t) the characteristic 
function is obtained. Thus the collective mark is a generalisation of these func- 
tions. 

Van Dantzig was not the only one considering this kind of general functional, 
but he gave a new turn to the theory by introducing an imaginary event E£ 
(which he called ‘“‘a catastrophe; e.g. the ignition of a red light”) which was 
associated with a realization of the random variable by means of a random 
experiment (“lottery”) in the following way: if 2, occurred then £ (non-E£) 
occurs with a prescribed probability 


(5) P(E | x,). 


According to (3), the unconditional probability P(Z) that no catastrophe 
occurs is then identical with the collective mark if (5) is substituted for A, . 
However, the probabilities (5) can be chosen arbitrarily, giving ample latitude 
for manipulation. An adroit choice of (5) may make it easy to compute P(£) 
directly and then C, so that e.g., the characteristic function follows at once. 

Thus the collective mark can be given an interpretation as a probability and 
the way to short-cut derivations of important results is opened. This method 
has been applied successfully by van Dantzig and some of his pupils, and it cer- 
tainly deserves further consideration. 


4. Application te rank correlation. (34| To illustrate this method consider 


m independent random samples S,, --- , S,, from the same continuous popu- 
lation 


(6) ppcived ss+,m; t=1,---,m; Dom =n). 
The statistic 


(7) T def the number of pairs (x); , z,,;) with A < wand a, > 2, 


(A,u=1,°++,m; t= 1,---,m; j= 1,--:, M) 


is fundamental in rank correlation, and its distribution is known. T can assume 
integer values from 0 to N = D see mn, . The method of collective marks allows 
of a straightforward derivation of the collective mark and the characteristic 
function of T. 

Let h marbles, numbered 1, ---, A, roll along the z-axis starting from + = 
and coming to rest according to independent random drawings from the con- 
tinuous population considered. The probability that two marbles come to rest 
at the same point is then zero, and all permutations of the numbers 1, ---, h 
as read from the marbles along the z-axis have the same probability, (h!)~’. 
Also, for any given subgroup of marbles all permutations are equaily probable 
and independent of the permutations in other subgroups having no marbles in 
common with the former. For any two or more subgroups without common 
elements all possible situations with respect to each other are equally probable. 





272 J. HEMELRIJK 


All this remains valid if the marbles are rolled one at a time in the order of 
their numbers, no two being in motion at the same time. Now each time a rolling 
marble on its way passes an “earlier” marble which has come to rest already, 
let a ticket be drawn at random and with replacement from a lottery L with 
probability 1 — A of a catastrophe. Consider the conduct of the kth marble. All 
permutations of the first k marbles have equal probability and therefore the 
probability that the kth marble passes 0, 1, ---,  — 1 marbles is k~ for each 
of these possibilities, independent of the number of passings realised by the first 
k — 1 marbles. The marginal probability that the kth marble does not cause a 
catastrophe is therefore 


(8) k7(1 + A + A? +--+ + A*") = (1 — A*)/[k(1 — A)). 


For all h marbles together we thus find, because of the independence men- 
tioned already, that the probability of no catastrophe is 


(9) P,(B) = (1 — A)” I] (1 — A*)/k = {hl(l — A)*}- IT (1 — A*). 


According to Section 3 this is also the collective mark of the total number of 
passings during the whole process of rolling A marbles, 


(10) P,(E) = C,(A). 


Rolling n marbles consecutively for the m samples S, , ---, S,, , starting with 
nm marbles for sample S, and ending with n,, for S,, , this also holds for all pass- 
ings. In order to link this up with the statistic 7’ we consider two kinds of pass- 
ings, 

1. “Same sample passings’”’ of marbles passing earlier marbles of the same 
sample; 

2. “Different sample passings’” of marbles of a sample S, passing marbles 
S, with uw > AX. 

T is the total number of different sampie passings. 

For the same sample passings of S, we have, according to (9), in an obvious 

notation, 


(11) Py, fame (BE) = C,, (A) (A “T 1, hat m), 


and, again using the independence of passings within different samples, we find 
for all same sample passings: 


(12) Prame (E) = I Ca,(A). 


The total number of passings is equal to the sum of the numbers of both kinds 
and these two numbers are again stochastically independent. Thus we find 


(13) Prorat (BE) - P me (E) - Paitterent (EB), 





DAVID VAN DANTZIG 


where the last factor may also be written P;(£). Therefore 


(14) C,(A) = I C.,(A)-Cr(A), 


where 


(15) Cr(A) = & Pr-A" = Poitier (EB) (N = 2 nyn,) 


is the collective mark of T. 
From (9), (14), and (15) we obtain 


N m 
Cy(A) = > Py: A? = c.(4) / TI Cn,(A) 
T= a | 


- [(ml -++ mm!) /ni] { (I (i= 4) / (Ui Uf a 4)| 


Substituting A = e” the characteristic function is obtained. For n, = 1 
(A = 1, ---,m), ie. for samples of one element each, n = m and 2T — N is 
equal to M. G. Kendall’s ranking + tatistic S; the characteristic function can then 
be reduced to 


(17) be"* = Il = (sin kt)/(k sin ¢), 
om) 


(16) 


a formula derived earlier by Kendall by means of recurrence relations. For 
n, > 1 the method pertains to a test for trend in the several-sample problem. 

The rest of van Dantzig’s statistical and probabilistic work is here presented 
by title only in the bibliography. 


5. Bibliography of van Dantzig’s statistical and probabilistic papers. 


[1] ‘“Mathematical and empirical foundations of probability theory’’ (in Dutch), Ned. 
Tijdschr. voor Nat., Vol. 8 (1941), pp. 70-93. 
[2] ‘‘Correspondence with B. de Finetti’’ (Punti di Vista), Statistica Vol. 1 (1941), pp. 229- 
242 (Milano). 
[3] Probability theory, (in Dutch; mimeographed course), Mathematisch Centrum, Amster- 
dam, 1947. 
[4] Mathematical Statistics, (in Dutch, mimeographed course), Mathematisch Centrum, 
Amsterdam, 1947-48. 
[5] ‘‘Sur la méthode des fonctions génératrices,’’ Le Calcul des Probabilités, Coll. Int. du 
Centre Nat. de la Rech. Sc., Vol. 13 (1949), pp. 29-46. 
(6) “Blaise Pascal and the meaning of mathematical thinking for the study of human 
society’’ (in Dutch; inaugural address), Euclides, Vol. 25 (1948), pp. 203-232. 
(7) ‘“‘Sur l’analyse logique des relations entre le Calcul des Probabilités et ses applications,”’ 
Congrés Intern. de Phil. des Sciences, 1-8 July 1949, Vol. 4 Caleul des Probabili- 
tés; Actualités Scientif. et Industr. Nr. 1148. 
[8] “Proposals for the standardization of symbols in mathematical statistics and bio- 
metrics,’’ Statistica (Neerlandica), Vol. 4 (1950), pp. 80-86. 
{9} ‘“‘Laplace, probabiliste et statisticien et ses précurseurs,’’ Archives Intern. d'Histoire 
des Sciences, Vol. 8 (1955), pp. 27-37. 
[10] ‘‘Une nouvelle généralisation de l’inégalité de Bienaymé,”’ Ann. de l’Inst. Henri Poin- 
caré, Vol. 12 (1951), pp. 31-43. 





274 J. HEMELRIJK 


{ll} ‘Some historical relations between mathematical and descriptive statistics’ (in 
Dutch), Statistica, Vol. 4 (1950), pp. 233-248. 

{12} ‘On the consistency and the power of Wilcoxon’s two-sample test,’’ Proc. Kon. Ned. Ak. 
v. Wet. 54A; Indag. Math., Vol. 13 (1951), pp. 1-8. 

{13} ‘‘Les problémes que pose l’application du Calcul des Probabilités,’’ Collection de Logique 
mathématique, Ser. B, Vol. 1 (1952), pp. 53-65. 

[14] ‘‘Time-discrete stochastic processes in arbitrary sets, with applications to processes 
with absorbing regions and to the problem of loops in Markov chains,’’ mimeo- 
graphed, Mathematisch Centrum, Amsterdam (1952). 

{15} ‘‘Carnap’s foundation of probability theory,’’ Synthese, Vol. 8 (1953), pp. 459-470. 

[16] ‘‘Mathematical consultation for medical, biological and other research’’ (in Dutch), 
Proc. Kon. Ned, Ak. v. Wet., Vol. 61, Indag. Math., Vol. 14 (1952), pp. 13-18. 

[17] ‘“‘Nature as opponent”’ (in Dutch), Statistica, Vol. 5 (1951), pp. 149-159. 

{18] ‘“‘Utilité d’une distribution de probabilités, ou Distribution des probabilités d'une 
utilité,’’ Coll. d’Econométrie, Paris, Centre National de la Recherche Scientifique, 
1953. 

[19] ‘‘Another form of the weak law of large numbers,’’ Nieuw Archief v. Wiskunde (8), Vol. 1 
(1953), pp. 129-145. 

[20] ‘‘Prediction and prophesy” (in Dutch), Statistica, Vol. 6 (1952), pp. 195-204. 

[21] ‘‘Statistical methods based on few assumptions’’ (with J. Hemelrijk), Bull. of the 
I.S8.1., Vol. 34 (1954), pp. 3-31. 

(22) ‘‘On arbitrary hereditary time-discrete stochastic processes, considered as stationary 
Markov chains, and the corresponding general form of Wald’s fundamental 
identity”’ (with C. Scheffer), Proc. Kon. Ned. Ak. v. Wet., Vol. A 57; Indag. Math., 
Vol. 16 (1954), pp. 377-388. 

[23] ‘“The responsibilities of the statistician’’ (in Dutch), Statistica, Vol. 7 (1954), pp. 199- 
208. 

(24) ‘‘Mathematical consultation in practice’’ (in Dutch) , Euclides, Vol. 30 (1954), pp. 53-67. 

(25) ‘‘Mathematical problems raised by the flood disaster 1953,’’ Proc. of the Int. Math. Con- 
gress (Amsterdam 1954), Vol. I, pp. 218-239. 

{[26] ‘‘Sur les ensembles de confiance généraux et les méthodes dites non paramétriques,”’ 
Coll. sur UV Analyse Statistique, Bruxelles 1954, pp. 73-91 (édition du Centre Belge 
de Recherches Mathématiques). 

(27| “Chatnes de Markov dans les ensembles abstraits et applications aux processus avec 
régions absorbantes et au probléme des boucles,” Ann. de l’Inst. Henri Poincaré 
XIV, fase. III (1955), pp. 145-199. 

[28] ‘*Ten years of mathematical statistics’’ (in Dutch), Statistica, Vol. 9 (1955), pp. 233-242. 

|29] ‘‘Econometric decision problems for flood prevention,’’ Econometrica, Vol. 24 (1956), 
pp. 276-287. 

[30] ‘A course in Markov chains’’ (in Dutch; edited by G. Zoutendijk and J. Hirsch), mim- 
eographed, Mathematisch Centrum, Amsterdam, 1956. 

[31] “Statistical Priesthood (Savage on personal probabilities),’’ Statistica Neerlandica, 
Vol. 11 (1957), pp. 1-16. 

[32] ‘‘From ‘‘Rekeningh in Spelen van Geluck”’ to decision theory”’ (in Dutch), Yearbook II 
of the University of Amsterdam (1956-57), pp. 39-50. 

[33] “Statistical Priesthood II (Sir Ronald on Scientific Inference)”’, Statistica Neerlandica, 
Vol. 11 (1957), pp. 185-200. 

(34) ‘Les fonctions génératrices liées & quelques tests nonparamétriques,’’ mimeographed, 
Mathematisch Centrum, Amsterdam 1957. 

[35] Extrapolation of the frequency-line of high water levels at Hoek van Holland by means of 
selected storms, (in Dutch; with J. Hemelrijk), Report for the Delta Commission, 
1959. 





DAVID VAN DANTZIG 275 


[36] The econometric decision problem concerning the prevention of floods in Holland, (in 
Dutch), Report for the Delta Commission, 1959. 

[37] ‘“‘Itérations markoviennes dans les ensembles abstraits’’ (with G. Zoutendijk), J. de 
Math. pure et appl., Vol. 38 (1959), pp. 183-200. 

[38] ‘‘Sur quelques questions de la théorie mathématique du choix pondéré,’’ Coll. “‘Sur la 
Décision’’, Paris, mai 1959, to appear. 

[39] “Prediction and Prophesy”’, to appear. 


Papers about van Dantzig’s Life and Work: 
[40] J. Hemelrijk, ‘‘In memoriam Prof. Dr. D. van Dantzig,’’ Statistica Neerlandica, Vol. 13 
(1959), pp. 415-432 (in Dutch). 


[41] J. Hemelrijk, David van Dantzig’s statistical work, Synthese, Vol. 11 (1959), pp. 
335-351. 





STOCHASTIC COMPARISON OF TESTS 


By R. R. Banapur 


Indian Statistical Institute, Calcutta 


1. Introduction. It is shown in [1], in a special case, that the study (as random 
variables) of the levels attained when two alternative tests of the same hypothesis 
are applied to given data affords a method of comparing the performances of the 
tests in large samples. It is the object of the present paper to show that this 
method, which may be called stochastic comparison, is quite generally applicable. 
It is shown here, in particular, that in a given statistical context there is usually 
a wide class of tests such that, if test 1 and test 2 are in the class, the asymptotic 
efficiency of 1 relative to 2 is well defined and readily calculable. The argument 
is stated and discussed in general terms in Sections 2, 3 and 4, and illustrative 
examples are given in Section 5. The examples include comparison of the Wald- 
Wolfowitz test and the Smirnov test for two samples, and of the Kruskal-Wallis 
test and the F test for k samples. 


2. Standard sequences. Consider an abstract sample space S of points s, and 
suppose that s is distributed in S according to some one of a given set {Ps} of 
probability measures P, , where @ is an abstract parameter taking values in a set 
Q. Let Q be a subset of 2, and let H denote the hypothesis that @ ¢ Q% . 

Let n be an index that takes the values 1, 2,3, --- . For each n, let T,, be a real 
valued statistic defined on S. We shall say that {7} is a standard sequence (for 
testing H) if the following three conditions are satisfied. 

I. There exists a continuous probability distribution function F such that, for 
each 6¢Q, 

(1) lim Po(T, < x) = F(x) forevery z. 


noe 
II. There exists a constant a,0 < a < ~, such that 


2 
(2) log (1 — F(x)] = —[1 + o(1)] as r—> 2%, 


III. There exists a function b on 2 — , with 0 < b < =, such that, for 
each 6c¢Q — XQ, 


(3) lim P¢ (e - b(6) | > ) = 0 forevery xz > 0. 


} 
| 
ne j 


The following is a typical example of a standard sequence. Let S be the set 
of all sequences s = (x, , 22, --- ad inf) with real coordinates z, , let 2 be a set 
of distribution functions @(z) on the real line such that u(@) = f*.2dé@ = 0 


Received May 11, 1959; revised November 5, 1959. 
276 





STOCHASTIC COMPARISON OF TESTS 277 


Let H be the hypothesis that » = 0. For each n, let T, be the ¢ statistic based 
on the first n co-ordinates of s. Then I is satisfied with 


F = [. (24) exp (—2?/2) dz; 


this F satisfies Il with a = 1 (cf. para. 1 in Section 5); and III is satisfied with 
b(@) = u(@)/o(@), where o* = [*..(x — wu)’ dé. In this example the index n 
denotes the sample size, and n has essentially the same role in other examples. 

Returning to the general case, suppose that {7',} is a standard sequence. Then 
T, has the asymptotic distribution F if H obtains, but otherwise 7, —+ © in 
probability. Consequently, large values of 7, are significant when 7’, is regarded 
as a test statistic for H. Accordingly, for any given s, we define 1 — F(T,(8)) 
to be the level attained by T, in the given case (n = 1, 2, --~). 

In general, 1 — F(T,(s)) is only an approximate level, i.e. for given n and s, 
it does not equal the probability of T, being as large or larger than T,,(s) when 
H obtains. However, the study of such levels seems legitimate and useful. In 
numerous cases of interest, in practice only approximate levels are used; perhaps 
because the exact null distribution of 7, is not tabulated and too difficult to 
compute, or because n is so large that it is believed unnecessary to refer to the 
exact distribution, or even because the “exact level attained by T7',,’’ does not 
exist, ie. for the given n the distribution of 7, varies with @ as 6 varies over 
Q . Even in the cases where exact levels exist and are used (or at least in prin- 
ciple could be used) for every n, one hopes that conclusions based on comparisons 
of approximate levels would provide at least an indication of what to expect in 
comparisons of exact levels. At present exact levels can be compared in only a 
few cases, e.g. the cases discussed in [1], because sufficiently precise estimates of 
the relevant tail probabilities are not available. This point is discussed further in 
remarks 8 and 10 of Section 4. 

Now let us regard the level attained by 7’, in a given case as a random variable 
defined on S. It is convenient to describe the behaviour of this random variable 
asn— o in terms of K, , where 


(4) K,(s) = —2 log [1 — F(T,(s))}. 
Then, for each @ in Q , 


(5) lim Po(K, <v) = Pr (xi <0) =1-—e for every wv > 0, 


where x; denotes a chi-square variable with 2 degrees of freedom. Again, with 
i: (0 for 06&% 

| alb(o) for @6c¢€2 — 

we have that, for any given @ in Q, 

(7) K,/n =e+ & 

where «,(8, 6) — 0 in probability asn — ~. 


(6) (8) 





278 R. R. BAHADUR 


To prove the propositions just stated, first consider a @ in Q. Let y and z be 
given constants, 0 < y < z < 1. Since F is a continuous distribution function, 
there exist numbers a and b such that F(a) = z — y, F(b) = z. Let A, = 
{s:T, < aj}, B, = {s:F(T,) < 2, and C, = {s:T, < b}. Then A, C B, CC, 
for every n, and hence z — y S lim inf P,(B,) S lim sup Po(B,) S z by (1). 
Since y and z are arbitrary, we have lim Ps(B,) = z for all z in (0, 1). (5) now 
follows from (4), and it follows from (5) that (7) is satisfied with c = 0. 

Now consider a @ in Q — %. For any z, let f(z) be the o(1) term on the right 
side of (2), —1 Sf S ~. It then follows from (4) that K,/n is identical with 
a(T*,/n)(1 + f(T,,)]. It is plain from this identity and (2), (3), and (6) that (7) 
is satisfied, and this completes the proof. 

In view of (7) we shall call c(@) the asymptotic slope of the tests based on 
{T.,} (or simply the slope of {7,,}) when @ obtains. 

The sequence {7',} will be said to be strongly consistent if condition III is satis- 
fied with (3) replaced by 
(8) P, (lim T,/n' = b(@)) = 1, 
and if (8) also holds with b = 0 for each @ in 2. It is readily seen that if {7',} 
is strongly consistent the «, in (7) — 0 with probability one. 

In concluding this section it may be worthwhile to note that the statistie K‘, 
is equivalent to T,, in the following technical sense: (i) {K4} is a standard sequence, 
(ii) for each @ in ©, the slope of {K+} equals that of {7}, and (iii) for any given 
n and s, the level attained by K‘, equals the level attained by T, . Since the level 


attained by K‘, is found by referring K‘, to the upper tail of a fixed distribution 
independent of F, {K4} is (so to speak) a normalised version of {7}. The nor- 
malised version of {K4} is {K‘4} itself. 


3. Comparison of standard sequences. Suppose now that 
(Tr) = (TP, Tr”, ---} and (TP) = (TP, TP, --+} 
are two standard sequences defined on S, and let F‘”(z), a;, and b,(@) be the 
functions and constants prescribed by conditions I, II, and III for sequence 7, 


(i = 1, 2). Consider an arbitrary but fixed @ in 2 — % and suppose that s is 
distributed according to P, . It is argued in this section that 


(9) ¢1,2(0) = (0) /e2(8) 


then serves as the asymptotic efficiency of sequence 1 relative to sequence 2, 
where c; = a,bj is the slope of sequence i, i = 1, 2. 

First consider the comparison of attained levels for a given sample size n. In 
a given instance, i.e. for a given s in S, it would be fair to say that the test based 
on 7’; is less successful than that based on T. if the level attained by TS” ex- 
ceeds the level attained by T\’, ice., if KS? < KY’, where the K, are defined by 
(4), (i, 7 = 1, 2). Since {7S} and {7%} are standard sequences, it follows from 
(7) and (9) that 


(10) KO /Ke > es 





STOCHASTIC COMPARISON OF TESTS 279 


in probability as n — «. Consequently, with probability tending to one, T<” is 
less successful than T'S? if ¢ < 1 and more successful if y > 1. If ¢ = 1, the 
two tests are equally successful up to terms of the order considered here. 
To compare the sample sizes required to attain the same level, for each i let 
} » 


mi", mz”, --- , mS", --+ be a sequence of positive integers such that 


(11) lim mS? = & (i = 1, 2). 


roe 


For simplicity in notation, let K{’(s) be written as K(n, s) and T,°(s) as 
T° (n, 8). We may then say that m!” and m{” are asymptotically equivalent 


sample sizes for sequences 1 and 2 respectively if 
(12) K? (m”, s)/K® (m2, 8) pa 1 


in probability as r — ©. In view of the argument of the preceding paragraph, 
the defining condition (12) means that, with probability tending to lasr— «, 
T”(m?”) and T(m’”) are equally successful test statistics up to terms of the 
order considered here. Now, since (11) is satisfied, we can apply (7) to K“°(n) 
with n restricted to the sequence {m‘"}, (i = 1, 2). This application shows that 


m‘” and m®” are asymptotically equivalent sample sizes if and only if 


(13) lim {m{?/m?"} = 1.2. 


roe 


It should perhaps be noted here that asymptotically equivalent sample sizes 
always exist, e.g. mS” = rand m\” = the integral part of re + 1. 

Now let us consider the case when both the sequences being compared are 
strongly consistent. It is plain that in this case (10) is valid as a pointwise limit 
for almost all s in S. Similarly, (11) and (13) suffice for the validity of (12) as 
a pointwise limit for almost all s, but in the present case a considerably stronger 
interpretation of ¢ can also be given, as follows. For any positive real number », 
and for any s in S, let N,(v, s) denote “the sample size required in order that K‘{"’ 
attains the value v.” N, is not well defined, but surely Nj; s N; S NT, where 
Nz = the least n such that KS?; = »v and N; = = if no suchn exists, and 
Nt = the least m such that K‘” > v for alln 2 m, and Ni = = if no such m 
exists. Now define R™(v, s) = Nz/Ni and R*(v, s) = Nz/Nyj, with the conven- 
tion that «/* = 1 (say). Then, except for a set of points s of P, measure zero, 
we have 
(14) lim RT = lim R* = gi. 


To establish (14), choose and fix an s for which 
(15) KS? /n = ¢; asn—>« (i = 1,2). 


Since the set of all such points s has probability one, it will suffice to establish 
(14) for the chosen s. It is clear from (15) that 0 < KS” < for all sufficiently 
large n, and that K\,” — « asn-—+ «. Consequently, 1 s Nj s Ni < = for 
all sufficiently large v, and Nj — © asv— «. We observe next that K°(N7) < 





280 R. R. BAHADUR 


o Ss K“ (Nz + 1), K°(N7 — 1) S » < K(N7) provided only that 2 < 
Ni < . It follows from these relations by application of (15) that 


(16) c.N;/v — 1, eNt/v— 1 (¢ = 1, 2) 
as v — ©. It follows from (16), as desired, that (14) is satisfied. 


4. Discussion. The following remarks are by way of discussion of the preceding 
two sections. 

1. The notion of a standard sequence is by no means essential to stochastic 
comparison. Suppose for example that {7'S"} satisfies condition I with F°” and 
condition III with b; , (¢ = 1 and 2), that F” = F” = F, and that the common 
limiting distribution function F is strictly increasing. For each s and n, let LS” 
be the level attained by 7%. Then LS” < LY if and only if TS? < TY’, 
(i, j = 1, 2). It follows, exactly as in Section 3, that bj/b} serves as the asymp- 
totic efficiency of sequence 1 relative to sequence 2. In particular, when a given 
non-null @ obtains, sequence 1 is asymptotically inferior to, or equivalent to, or 
superior to sequence 2 according as bi(@) <, or =, or > b3(@). 

This last criterion was suggested by Anderson and Goodman ( [2], pp. 108-109) 
in the context of chi-square and likelihood ratio tests of certain contingency 
tables. Their suggestion seems to be the first explicit reference to stochastic com- 
parison in the literature. 

2. Suppose that in Sections 2 and 3 the index n is restricted to a subset of the 
positive integers. It is easily seen that the various definitions and conclusions 
remain valid in this case, except possibly for (14). However, the proof of (14) 
goes through if the following condition is satisfied: with j, < je < --- the se- 
quence of values of n, j,/j-41 — 1 as r — «. This condition also ensures the 
existence of asymptotically equivalent sample sizes in the sense of (13). 

3. In the paragraph preceding (14) in Section 3, the random variables R™ and 
R* are well defined even if neither of the two standard sequences is strongly con- 
sistent. It would be interesting to know whether (14) holds in this case with the 
almost everywhere limits replaced by limits in probability. 

4. Suppose that it is desired to make an asymptotic comparison of two se- 
quences of tests, which happen to be based on standard sequences of real valued 
statistics. The verification that this last is the case, and the determination of the 
respective asymptotic slopes, requires little knowledge of the exact distributions 
of the individual members of each sequence of statistics. Consequently, the 
method of this paper is much more readily applicable than comparisons based 
explicitly on power functions (cf., e.g., [3], [4], [5], [6], [7]), since the latter com- 
parisons necessarily require detailed knowledge of the exact distributions of in- 
dividual statistics at least in the non-null case. This remark is supported by the 
examples given in the following section. 

5. Although stochastic comparison as formulated in Sections 2 and 3 makes no 
reference to power function considerations, there is a formal connection between 
the asymptotic slope of a standard sequence and the asymptotic power of the 
corresponding sequence of tests. This connection is discussed in the appendices 





STOCHASTIC COMPARISON OF TESTS 281 


to this paper. It is pointed out in Appendix | that ¢ can be regarded as the asymp- 
totic relative efficiency when the power is held fixed (or at least bounded away 
from 0 and 1) and the resulting test sizes are compared. This fact (cf. also the 
last sentence of remark 6 below) is stated here not so much as a justification of 
stochastic comparison as a comment on the numerical value of ¢. 

6. Suppose that is a metric space, and that 2 — Q% is dense in Q . Let {7."} 
and {7} be standard sequences, and let ¢; »(@) be defined by (9). Suppose that, 
for each @ in Q , ¢:,.2 has a limit as @ — 6 through values in 2 — Q, and call 
this limit y; »( 4). 

Limiting efficiency functions such as y have a special role in any asymptotic 
theory of comparison, for the following reason: if the experimentation is under- 
taken mainly for the purpose of testing H, large sample sizes will in practice 
occur in the non-null case only if @ is in the neighbourhood of some point in 0% . 
It is therefore of some interest that alternative methods of asymptotic compari- 
son often lead to the same limiting efficiency function. In particular, as is shown 
in Appendix 2, it is quite generally true that the limiting efficiency y derived in 
the preceding paragraph coincides with Pitman’s efficiency function in cases 
where Pitman’s theory is also applicable. 

7. Given a parameter space 2 of points @ and a hypothesis H concerning the 
value of @, suppose that {7°} is a standard sequence (for testing H) defined on 
a sample space S‘” of points s\”, i = 1, 2. Let S = S” x« S™ be the set of all 
pairs s = (s”, s”), and for each @ in Q let Ps be any probability measure on S 
which is consistent with the marginal distributions of the s‘”. Then both sequences 
{T,} are standard sequences defined on S, and the arguments of Section 3 apply 
verbatim. 

In other words, stochastic comparison can be applied even in cases where the 
two sequences are not defined on the same sample space to begin with, e.g., if 
S” and S” are the spaces of alternative experiments. It follows, in particular, 
that if {7'.} is a natural or optimum sequence on S‘” then ¢, 2 is, ina sense, the 
asymptotic efficiency of experiment 1 relative to that of experiment 2. This ap- 
plication is discussed in more detail in [8]. In this application, the limiting 
efficiency ¥,2 corresponds to the relative “amount of information per observa- 
tion” in the theory of estimation. 

8. The formulation of Section 2 can be generalised so as to include the case 
when for each n the level attained by the statistic 7’, is defined in terms of a dis- 
tribution function depending on n. One such generalisation is the following. Let 
{T..} be a sequence of real valued statistics such that conditions I and III are 
satisfied. For each n, let F(z) be a distribution function, to be thought of as the 
null distribution function of T,, , such that the following condition II* is satisfied : 


II*. (i) lim,.. F,(z) = F(z), and (ii) there exists a function f on (0, ~) 
into (0, ~) such that, for any given sequence {u,} of positive constants u, 
such that lim... {u%/n} = z, where 0 < z < «, we have 


2n~ log [1 — Fn(un)) = —f(z)[1 + 0(1)] 





282 R. R. BAHADUR 


For each s and n, let L,(T,) = 1 — F,(T,), and let K, = —2 log L,(T,). It 
can then be shown that (5) and (7) continue to hold, with ¢ defined by 


for 0&% 
fiv'(6)] for @e€2 — Q%. 


(6*) c(6) 


The proofs, though not entirely trivial, are omitted. 

In the special case when F satisfies condition Il, and F, = F for each n, II* 
is also satisfied with f(z) = az, and the formula (6*) reduces to (6). 

Let us say that {7} is a standard sequence in the strict sense if there exists a 
sequence {F,} such that P(T, < x|@) = F,(x) for each n, z, and @ in , and 
such that I, I1*, and III are satisfied by {7,,} and {F,}. In this case c(@) defined 
by (6*) serves as the exact slope of the tests based on {7',}, and this can be com- 
pared with other slopes (exact or otherwise) as in Section 3. It is clear, however, 
that determination of the exact slope (assuming that it exists) is as difficult in 
concrete cases as exact analyses based on power function considerations. 

9. For i = 1 and 2, let {7S} be a sequence satisfying I, II, II* and III. Sup- 
pose ¢).2 is the efficiency function derived in Section 3, and y;.2 the limiting 
efficiency function derived from ¢;,2 . Let ¢12 be the efficiency function obtained 
by comparison of the exact slopes, and 7. the limiting efficiency function derived 
from giz. In the examples available at present where the conditions of this 
paragraph are satisfied, ¢,2 differs from giz at every non-null 6, but yi = ie 
at every null @. (cf. example 1 in Section 5). It is not difficult to formulate 
general sufficient conditions in order that yi» = vie, but perhaps it would be 
more useful to discover and study further examples of sequences which possess 
exact slopes. 

10. In the examples of stochastic comparison given in the following section, 
the level attained by a statistic T, is defined as in Section 2. As was stated in 
Section 2, this procedure is generally inexact. It is therefore of some importance 
to consider whether it is really useful to compute ¢(@) = ¢(@)/e:(@), and to 
study ¢ as a function of 6, unless it is known that c, and c are the exact slopes 
of the sequences being compared. A categorical answer to this question should 
await the study of further examples, and of certain theoretical problems. The 
author’s opinion at present is that conclusions based on an inexact ¢ are neces- 
sarily tentative, but that such conclusions may well prove useful, especially in 
cases (e.g. examples 2 and 3 in Section 5) where no exact methods of comparison 
are available at present. Some of the issues involved here are mentioned in the 
following paragraphs. 

The formal content of this paper is essentially descriptive. Given a standard 
sequence (or more generally, a sequence satisfying I, I[* and III) a description 
of the asymptotic behaviour of the sequence is given in Section 2, and it is pointed 
out in Section 3 that two such descriptions admit a direct and intuitively plausi- 
ble comparison. Consequently, whether or not c, and ¢, are exact slopes, ¢ = ¢;/¢2 
is an exact relative efficiency in the sense that it is based on an accurate descrip- 





STOCHASTIC COMPARISON OF TESTS 283 


tion of what happens in the limit when the prescribed methods of computing 
levels are used. 

If the prescribed methods of computing levels are inexact, the plausibility and 
usefulness of the present descriptions is diminished by the following considera- 
tions. The usual inexact methods (e.g. referring a contingency table chi-square 
to the chi-square distribution) are not intended for computation of very low 
probabilities. Consequently, if a non-null @ obtains, and n is sufficiently large, 
the chances are that the prescribed methods will be abandoned, or at least that 
the levels computed thereby will not be taken seriously. A related consideration 
is that the inexact slope c of a statistic T, can scarcely be said to describe the 
actual performance of 7’, , since ¢ incorporates computational errors of unknown 
magnitude and direction. Consequently, if in a given case g(@) = ¢,/c, = § (say), 
it cannot be concluded that 7% is really twice as efficient as 7”, or even that 
TS is really more efficient than 7°”, when @ obtains. There are examples showing 
that this objection to the comparison of inexact slopes is not purely hypothetical, 
i.e., that the values of an inexact ¢ can indeed be misleading (cf., e.g., the last 
part of example 2 in Section 5). 

There is, however, some reason to think that the numerical value of an inexact 
¢ can be very misleading only if @ is far from 2 . In particular, the limiting effi- 
ciency ¥ derived from an inexact ¢ often coincides with the limiting efficiency 
functions derived by exact methods of comparison (cf. remarks 6 and 9). It is 
perhaps fair to say that such value as a given method of asymptotic comparison 
may have stems mainly from the limiting efficiency function obtainable by that 


method. If so, the comparison of inexact slopes affords, or at least promises, a 
very short cut to the main conclusions of exact analyses. 


5. Examples. It is convenient to note at the outset of this section that the 
following distribution functions F satisfy condition II of Section 2: F(z) = 
fi0 (2) exp [—$t'] dt, with a = 1; F(z) = P(xu < x), where xi denotesa 
chi-square variable with k d.f. (1 < k < @), also with a = 1; and F(z) = 
1 — 2>°-%, (—1)"" exp [—2r’z’), with a = 4. 

That F” satisfies (2) with a = 1 follows from ({9], p. 166). To treat F™, let 
m be a positive integer such that 2m 2 k. For any z > 0 we then have 


2j1 — F'(z)] = Pa > 2) S P(x > 2) 
= 1— F™(z) S Pix > 2) = P(Z SE m—1) 


where Z is a Poisson variable with mean 42”. It follows from the result for Fe 
and from a direct calculation, respectively, that the lower and upper bounds for 
1 — F™ are both of the form exp [—4}2°(1 + 0(1))]; hence 1 — F™ is also of 
the same form, i.e. (2) is satisfied with a = 1. The verification in the case of 
F® is straightforward from the definition of F”. 

In the examples of stochastic comparison that follow, every sequence {7} 
introduced in a particular context is a standard sequence in that context, and 
the asymptotic null distribution is either F’, or F® (for some k = 1,2, ---), 





284 R. R. BAHADUR 


or F”, Except possibly for sequence 1 in example 3 and sequence 3 in example 5, 
each {T7,j is strongly consistent. Throughout the remainder of this section 
G denotes the standardized normal distribution function, ie. G(r) = 
S20 (2) exp [—0°/2] dt. 

Examp .e 1. Let s = (2; , 22, --- ad inf), where the z, are independent random 
variables with each z, distributed according to G({[z — u)/o), where wu and « > 0 
are entirely unknown. Let @ = (y, a), and let H be the hypothesis that u = 0. 

For each n = 1, 2,--- let U, = the number of positive z;’s in the set 
(ay, 2, °*+, ta}, and let TS” = | 2U, — n|/n'. For each n = 2, 3, --- let T? 
denote the ¢ statistic based on {x , 22, --+ , tn}. Then for any @ = (yu, c) with 
un 0, the efficiency of sequence | relative to sequence 2 is 


(17) ¢i2(0) = [2G(4) — 17'/a° where A = p/o 


This efficiency function is different from the ones derived in [1] by comparisor 
of exact levels, and also from the one derived in [7] by comparison of power func- 
tions. However, we have ¥:2(0) = lim,.o¢:2(u, ¢) = 2/ for every o, and the 
same is true of the other efficiency functions cited. It should be noted that ¢ » 
is a decreasing function of | A |. 

Now suppose that ¢ is known and for each n let TS be Kolmogorov’s statistic, 
ie. T® = (n)! sup, | K,(2) — G(2/c) | where K,, denotes the distribution func- 
tion with masses 1/n at each of the points x, , z2, --- , and z, . It follows from 
Kolmogorov’s theorem, and the Glivenko-Cantelli theorem, that the asymptotic 
slope of {7S} is 48°, where’ = sup, | G(x — A) — G(z) |. It follows hence that 


(18) ¢3.2(0) = [2G(4/2) — 1]°/(4/2)’ 


Since we always have ¢1,3; = ¢1.2/¢3.2, it follows from (17) and (18) that 
¢1,2(0) < 1, that ¥1.2(¢) = 1 for all c, and that g,; > 0as|A|— ~. 

EXAMPLE 2. Let s = (2; , 22, --- ad inf) where the z, are independently dis- 
tributed according to G(z/a), where o is entirely unknown. H is the hypothesis 
o= 1. 

Let TS” be Kolmogorov’s statistic based on {z; , 22, --- , 2} and let T “” 
| (2507 2i)* — (2n)!}. It is then found that yi2 = limsa¢gi2(o) = 1/(re) = 
12/100, but the function ¢; 2 need not be given here. 

Next, let 7% be the sequence obtained by normalizing the best estimate 
o’, ie. T = | (>of zi — n)/(2n)'|. We then have 


(19) gis(o) = 4/(1 +o)’. 


That ¢2,; is not =1 is due entirely to the fact that the common asymptotic distri- 
bution function for sequences 2 and 3 (i.e. F® with k = 1) does not provide the 
exact levels attained for a given n. 

EXxamp te 3. Let F;(x) and F(x) be probability distribution functions on the 
real line, such that dF; = f,;(x) dx where f; is a continuous function of z, except 
possibly at a finite number of points, (j = 1, 2). Let s = (x{2); z{2)) where 


1 1 . (2 2 2 . . 
aie) = (x{}}, 2”, --+ ad inf) and z{2, = (2{”, x”, --- ad inf) are independent 





STOCHASTIC COMPARISON OF TESTS 285 


sequences of independent random variables, with z\ distributed according to 
F;, (n = 1,2,+++ ;j7 = 1, 2). Let 6 = (F, , F2) and let H be the hypothesis 
that F,(z) = F;(z). 

Let k and I be given positive integers with k < 1, and write p = k/l,q = 
1 — p. Assume henceforth in this example that n is restricted to the set 1, 21, 
3l, --- . For each n, let m = m(n) = np and m, = m(n) = nq. 

For each n, let s, = (xf, xf”, --- , 22) ; cf, of, ---, 2&). Let U, denote 
the statistic of Stevens and Wald and Wolfowitz [10] when the datum is s, , i.e. 
U,, = the total number of runs of superscripts 1 or 2 when the n elements of 8, 
are arranged in ascending order. It follows from the results given in [10] that, 
for any 6, U,,/n converges in probability to 2pq{1 — y], where 


*(hi(a) — felx)P 
20 = ( [ SS e 
(20) : pa) » pfi(x) + gfe(x) 
This form of the consistency theorem of Wald and Wolfowitz is due to Pitman 
[3]. Now let 7%” = [u(n) — U,]/o(n) where » and o’ are the mean and variance 
of U when H obtains. It then follows by referring to the results in [10] for the 
null case that {7",”} is a standard sequence, and that its slope is 


(21) (6) = 7’, 
where ¥ is given by (20). 
Next, let 7%” be the statistic of Smirnov, i.e. 
Tx’ = (npq)* sup, | Kn’ (x) — Ky"(z) |, 
where KY’ is the distribution function with masses 1/m, at each of the points 


ay, ++, 2), (j = 1, 2). It follows from the theorem of Smirnov and the 


Glivenko-Cantelli theorem that the slope of {7%} is 


(22) c2(6) = 4pqd’, where 6 = sup | F;(xz) — F2(z) | 


Consequently, the efficiency of the Wald-Wolfowitz test relative to the Smirnov 
test is 


(23) ¢1.2(8) = °/4pge’. 


It isseen from (20) and (23) that if (f, — fe)*/min {f, , fo} is integrable, ¢ — 0 
as p — 0 or 1, i.e. the relative efficiency of sequence 1 is very small if the two 
sample sizes m, , m, are very different. It is also seen from (20) and (22) that if 
F, and F, are both members of a sufficiently smooth parametric family of distri- 
bution functions, and if F; is close to F, , then ¢ will again be nearly zero, for y 
will then be of the order of magnitude of 6°. This is the case, for example, if 
(a) Fy = G(x), PF: = G(x — A), and |A! is small, or if (b) F; = G(r), F, = 
G(2z/c), and ¢@ is nearly 1. 

We observe next that regularity conditions are essential to the arguments of 
the preceding paragraph. Thus if (c) f; = 1 on (0, 1) and 0 elsewhere, and f, = 
f(a — A), we have ¢ = 1/(4pq) for all A. A different irregular case is (d) 





286 R. R. BAHADUR 


fi: = 1 on (0, 1) and 0 elsewhere, and f, = 1 + sin (2xkx) on (0, 1) and 0 else- 
where, where k is a positive integer. In this case 8 = 1/(4k) and ¢ = k’. X(p), 
where ) is a positive constant independent of k. By taking k sufficiently large we 
see that sequence 1 can be much more efficient than sequence 2 even though F; 
is close to F; in the sense that 4 is small. 

Suppose for a moment that in case (d) it is required to discriminate between 
F, and F, (with a given large k) on the basis of a single observation z. It is then 
clear that discriminant functions such as z or |z| are practically useless (because 
6 is small) and also that, in comparison to the optimum criterion of Neyman and 
Pearson, x and |z| are very inefficient (because f.(z)/fi(z) is far from mono- 
tonic in z or |z|). We shall show that ¢ can be very large only in the rather 
extreme cases where both these conditions are satisfied. More precisely, it will be 
shown that in general 


(24) ¢1.2(0) S (1/4pq) min {£, (84/25)*} 
where 5, is the L, distance between f; and f. (0 < 54 S 2), and & is essentially 


the least upper bound to the number of times that the graph of y = fo(z)/fi(z) 
crosses any line y = const.. It follows from (24), in particular, that if p = q = } 
and f,/f, is monotonic then necessarily ¢ S 1. 

To establish (24), let @ = (F,, F:) and p be given, with dF; = f; dz, and let 
7 be defined by (20). Define g(x) = pfi(x)/pfi(z) + gfe(z) if fi(z) + fo(x) > 0 
and g = p (say) otherwise. Let Fy = pF; + gF.. Then 


[loan [mae 


Consequently 
(pa) [Ui — fu/h + afi? dP 
(pa) [_ (9 — p)/pal ar 
- [. \(g — p)/pgl-g-dFo 


[Gh - Hor 


— me Pee 


where wu; = f*.. g dF; (j = 1, 2). It follows from (25), by a well known representa- 
tion of the expected value of a random variable, that 


1 
(26) — [ [Ga(y) — Gi(y)] dy, 


where G;(y) = Pr.(g(z) S y| F;),j = 1. 2. 





STOCHASTIC COMPARISON OF TESTS 287 


Now let P; denote the probability measure on the Borel sets of the real line 
corresponding to F; ,j = 0, 1, 2. It is evident from (26) that 


y S sup, {G:(y) — Gi(y)} S sup, {P2(A) — Pi(A)}. 


As is easily seen, sup, {P:(A) — P,(A)} = 4f"alhi — fel dz = $64. Hence 
Y=s 5,/2. 

Next, for any interval J on the real line define ¢(/) as follows: ¢ = 0 if 
Po(1) = 0 or if Po(J) = 1;¢ = 1 if O < Po(J) < 1 but J is unbounded; and 
¢ = 2 in the remaining case. It is then easily seen that P,(J) — Pi(J) S ¢(1)-6 
for all J, where 6 is given by (22). Now choose and fix a y, 0 < y < 1, and an 
e> 0. Let A = {z:g(xz) < yj. Let n(y, ©) be the infimum of ¢(/,) + ¢(/2) + 
-+» + ¢(J,) over all finite collections {J,, J2,--- , Zs} of disjoint intervals /, 
such that with B = J, + I, + --- + J, we have P,(A) — P,(A) S P,(B) - 
P,(B) + «. It is then clear that P.(A) — P,(A) S nly, €)-6 + «. Since « is arbi- 
trary, we have that P(A) — P(A) S n(y)-5, where o(y) = limo n(y, «), 
0 S n(y) S «. Assuming that 7 is a measurable function function of y, it fol- 
lows from the present definition of A and (26) that y < 8-f5 9(y) dy. In any 
case, if — is the essential supremum of (y)( S ©) we have y S £-4. Thus 
y S min {£-é, 6,/2}, and (24) now follows from (23). 

The argument of the preceding paragraph is valid without any restrictions on 
f, and f, , but the final result is nontrivial (i.e.  < ©) only under certain condi- 
tions. The reader may verify, in particular, that if there exists a set B with 
P,(B) = 1 such that g or —g is non-decreasing on B then — S 1. More generally, 
if for some k = 1, 2, --- it is possible to find disjoint intervals J, , --- , 7, such 
that >> Po(J,) = 1 and such that g is essentially monotonic on each J, then 
ges k. 

In concluding this discussion of example 3, let us note that the numerical value 
of y depends on F; and F; only through the error probabilities in the Neyman- 
Pearson theory of testing F; against F; given z. This dependence can be made 
explicit as follows. For any subset A of the real line let a(A) = P;(A) and 
B(A) = 1 — P(A). a and 8 are then the errors of the first and second kind in 
using A as the critical region. For any z,0 < z < », let r(z) = a(A,) + B(A,) 
where A, = {x:f2(z) 2 2f,(x)}. It follows from (26) by a straightforward caleu- 
lation that 


(27) ge r (1 — r(2)\Ipq/(p + g2)*l de. 


It follows from the preceding paragraph that the slope of the Wald-Wolfowitz 
test remains unchanged if each observation in the two samples is subjected to a 
1 — 1 transformation before being supplied to the statistician. This transforma- 
tion need not be continuous or monotonic—all that is required is that the distri- 
butions of the transformed variable also satisfy the conditions stated at the 
outset of this example. 


Examp.e 4. Let s = (x{2) ; 2(2);--- ; 2{2)) be & independent sequences 





288 R. R. BAHADUR 


a2, = (xf, xf, --- ad inf) of independent random variables z{? (m = 1, 


2,°--;j = 1,2, ---, k). Let F(z) be a continuous distribution function such 
that 


(28) [_zar =0, [_#ar = 1. 


Let o > Oand mw, we, °-* , we be constants and suppose that z¥’ is distributed 
according to F({[z — y,|/o), (m = 1, 2,---;j = 1, 2,---, k). Here 6 = 
(P31, we, °** , e530). H is the hypothesis that uw, = w= --- = uw. 

For each n = k, k + 1, --+ let mn), --- , m,(n) be positive integers such 
that n = m + m + --- + m,. It is assumed that 


(29) lim m;/n = p;, where 0 < pj < 1(j = 1,2,---,k). 


For each n, let s, = (2{”, --- , ae) ;-- ; af”, --- a), andlet T.” bethesquare 


root of the statistic of Kruskal and Wallis ({11], Section 2). It then follows from 
the results given in [11] that {7S} is a standard sequence, with slope c, defined 
as follows. Let 


weit, ete) Ee [ (F(z + 4.) — F(2)] dF, 
(30) 


k 
B; - Li, Pr air, 


for all r, sand j = 1, 2,---,k. Then 


k 
(31) (0) = 12( > pai). 


Next, let 7% denote the square root of the usual analysis of variance statistic 
based on s, . Then {7'S”} is also a standard sequence, and we have 


k k 
(32) eis = 12 (> pai) / (> pal), 


where 


- 
(33) v5 = >. pAy forj = 1,2,---,k. 
rel 
Suppose now that dF = f(z) dz, and that f is sufficiently regular so that 
fe. (F(x + A) — F(x)|dF = Af*. f(z) dF + A-o(1) as A — O. It then 
follows easily from (30), (32), and (33) that, for any 6 = (F; 4, u,--*,;¢), 


(34) ¥1,.2(0) = lim ¢:2(6) = 12 If j aF . 
6+86 « 


It is shown in [7] that y is never less than .864. On the other hand, since | a,, | < 1, 
we have | 8;| S 1 and hence ¢,. S 12/( > pi). Consequently ¢:.2. — 0 as 





STOCHASTIC COMPARISON OF TESTS 289 


max, {| 7; |} — ©, i.e. as the mean of at least one sequence z’” becomes very 
different from the weighted mean of the others. 

Exampte 5. Let s = (v,, %, --- ad inf) be a sequence of independent and 
identically distributed random variables », = (2, , ya), Where z and y have a 
bivariate normal distribution. H is the hypothesis p = 0, where p is the correla- 
tion between z and y. 

For each n, let r, denote the sample product moment correlation based on 
8, = (ty, 02, °°: , On). Let TS” = 4} log [(1 + r,)/(1 — ra) |, and T. = 
(n — 2)'.| r.j/l—r f°, We then have 


_a--) (; te] 
(35) ¢1,2(8) yet [oe r=, 


It is easily seen that ¢),2 is a decreasing function of |p|, varying from 1 to 0. 

Next, let a, = the median z value in s, , and b, = the median y value in s, . 
Let fi, = the number of pairs v; in s, with z; > a,, yi > 5, ; fos the number 
with 2; < a,, ys > ba, fan the number with z; < a, , yi < b, ; and f,, the num- 
ber with z; > a, , y; < b,. Let T. = the square root of the chi-square statistic 
based on the 2 X 2 table of the four frequencies f. Then 


a ie | 
(36) a2 = (4/m°)- (1 — f(T, and ¥s,2(0) = 4/x° = 41/100. 

Now let r, be Spearman’s coefficient of rank correlation based on s, , and let 
r,, be the difference sign covariance, i.e. 


r= > sen (x; — 2;)-sgn (ys — ys)/n(n — 1), 
‘+= 
where sgn (z) = +1, 0, or —1 accordingly as z >, =, or < 0. Let TT.” = 
|r, |/o’(n) and TS” = |r. |/o”(n) where o’ and o” are the standard deviations 
of r’ and r” respectively when p = 0. We then have, by using formulae given in 
{12}, 


ei fe es 2 1M, an _(o/2) 7 
eus(0) = (9/2*)-(1 = 92) [2 2)T, 


gua(0) = (9/2) -(1 — p*)- [=I 


It follows from (37) that Yso(0o) = Ws2(%) = 91/100. It also follows that 
¢4,5 is a decreasing function of | p |, varying from 1 to 4/9. 


(37) 


APPENDICES 


The argument of this paper depends entirely on the practical principle that if 
the null hypothesis does not obtain, and if in a given instance test statistic 1 
attains the level L, while statistic 2 attains L, , statistic 2 is superior in that 
instance if and only if L, < L,. As might be expected, this principle is closely 





290 R. R. BAHADUR 


related to comparisons based on power function considerations. The connection 
is discussed in the following appendices. Z 


Appendix 1. It will be shown here that the asymptotic slope of a standard 
sequence is a functional on the family of power functions associated with the 
sequence of statistics (proposition 2), and that slopes are consistent with power 
in the following sense: if the power of the test based on 7S” never exceeds that 
of the corresponding test based on 7”, then ¢;.2 S 1 (proposition 3). These con- 
clusions are useful analytical tools in applications such as the one mentioned in 
remark 7 of Section 4. 

Consider a sample space S of points s, a set {P,:@ ¢ Q} of alternative distribu- 
tions on S, and a hypothesis H:@ ¢ Q . Let {7,,} be a (not necessarily standard ) 
sequence of real valued statistics such that, for each @ in Q , 


(i) lim P)(T, < x) = F(x) for every z, 


where F is a probability distribution function, and such that, for each @ in 2 — Q, 
(ii) T,— © in probability. 


For any given constant a, 0 < a < 1, and each n, the size a test (of H) based 
on 7’, is then defined to be the following procedure: reject H if and only if 
1 — F(T,) S a. In general, this test is not literally of size a, i.e. 


P(1 — F(T,) S a) xa 


for each n and each @ in Q , but the present definition seems legitimate and useful 
in view of the reasons stated in Section 2. For any @ in © and any n, let 1 — 
8,(a|@) denote the power of the size a test based on 7, when @ obtains, i.e. 
Ba(a| 0) = Po F(T.) <1 — a). 

Now consider a fixed @ in 2 — and a fixed a. It is easily seen from (ii) that 
8, —0asn— ©. It can be shown in certain cases that in fact n™ log 8, — —r/2, 
where r is a positive constant depending on @ (and possibly also on a). In 
such cases, if r, and r; are the constants associated with two sequences { 7'<”} and 
{TS}, ri/re is the asymptotic efficiency of sequence 1 relative to sequence 2 in 
the following sense: r,/r2 is the (limiting) ratio of sample sizes required to attain 
an assigned (arbitrarily small) probability of an error of type two. This method 
of comparison is due to Hodges and Lehmann [7]. A very similar method was 
devised earlier by Chernoff [6]. The method is, however, quite difficult to apply 
because precise estimates of 8, are required. 

An alternative analysis which suggests itself is the dual of the preceding one, 
i.e. to let 6 and 6 be fixed, say 8,(a, | 6) = Bo , where 0 < 8» < 1, and to study 
the rate at which a, must then tend to zero. This approach was mentioned by 
Cochran ({13], p. 323). It might appear at first sight that this second method 
would be just as difficult as the first, but that is not the case. In the present 
formulation, a and 8 are not really interchangeable. Indeed, in the definition of 





STOCHASTIC COMPARISON OF TESTS 291 


the power function, we have already exploited the lack of symmetry between 
% and 2 -- Q by replacing the set of null distribution functions 


{[Po(T, < x):0€%,n = 1,2,--+} 


by the single distribution function F(z). It follows from proposition 2 below that 
if {7',} is a standard sequence then n‘ log a, — —c/2, where c(@) is the slope 
of {7,,} as defined in Section 2. Consequently, ¢ = ¢,/c:; serves as the relative 
efficiency of two standard sequences in this method of comparison. 

A third method of comparison of power functions, due to Pitman [2], depends 
essentially on fixing both a and §, say 8,(a| 6,) = 8 ,.and studying the rate at 
which 6, must then tend to some null value. It will be shown in Appendix 2, 
under essentially the same general conditions as are usually required for applica- 
tion of this method (cf., e.g., [4]), that asymptotic efficiency in Pitman’s sense 
coincides with y, the limit of ¢ as @ tends to a null value. 

We proceed to establish the connection between the slope of a standard se- 
quence and the family of power functions associated with the sequence. In the 
following propositions 14 we consider an arbitrary but fixed @ in 2 — Q. 

Proposition 1. Suppose that {T,} is a standard sequence with slope c(@). For 
any sequence |a,} of values a, in (0, 1), let 


(iii) v, = 2 log (1/a,). 
Then 
(iv) lim inf {v,/n} < ¢(@) implies lim inf {8,(a, | @)} = 0, 


n-s% a~ 


and 


(v) lim sup {v,/n}| > ¢(@) implies lim sup {8,(a, | #)} = 1. 


n-ne noe 


Proor. Let K,, be defined by (4). It then follows from the definition of 8, and 
(iii) that 


(vi) B.(an| 0) = Po K, < v,). 


As is shown in Section 2, K,/n — ¢ in probability. It follows hence from (vi) 
that (iv) and (v) are valid. 

As an immediate consequence of proposition 1 we have 

Proposition 2. If 


(vii) 0 < lim inf {8,(a, | 6)} S lim sup {8,(a, | @)} < 1, 


non a> 


then 


( viii) lim {v,/n} = c(@). 


It should be observed that there may exist no sequence {a,} such that (vii) is 
satisfied. We shall then say that { 7’,} is degenerate at @. Although degeneracy can 





292 R. R. BAHADUR 


scarcely occur in the applications, it is necessary to take it into account in the 
general case. 

Next, let {7S} and {7%} be two standard sequences, and let 1 — 8%" (a| 6) 
denote the power function of the size a test based on TS”, i = 1, 2. For each n, 
let 6,(1, 2| 6) = sup, [8 (a| 0) — BL” (a| 6)]. It is easily seen (e.g. from (vi) ) 
that 0 < 6 < 1. Let us say that {7%} dominates {7%} at @ if 
(ix) lim ,(1,2| 6) = 0. 

Proposition 3. If {T°} dominates {7} at 0, then ¢,2(0) < 1. 

Proor. Suppose first that {7} is not degenerate. Let {a,} be a sequence such 
that (vii) holds with 6 = 8”, and let v, be defined by (iii). Then (0) = 
lim (v,/n) by proposition 2. Since {7%} dominates {7%}, we have 


lim inf BS” (a, | @) = lim inf BY’ (a, | 6) > 0. 


Hence ¢,(6) S lim inf (v,/n) by (iv). Thus ¢(@) S @(@). 

Suppose now that {7'.”} is degenerate at 6. Let ¢ be a non-negative random 
variable with a continuous distribution function (e.g. a chi-square with 1 d.f.), 
independent of s, and let {d,} be a sequence of positive constants with 
limps \n = 0, (e.g. An = 1/n). Define T = (K® + d,-¢]'. It is readily seen 
that {7~”} is a standard sequence on the space S* of points (s, t), that c; = @, 
and that 6° (a |6) = P( KY + r,-t <v), where v = 2 log (1/a). Since \,-t > 0, 
and since Bo (a| 0) = Pe K® < v), it follows that {7%} dominates {72} and 
hence also {7'\”}. For each n, the distribution of KY + \,-t¢ is continuous when 


6 obtains, so that {7%} is not degenerate. Hence c, S c;(=c2) by the preceding 
paragraph. This completes the proof. 

The following is a partial converse of proposition 3. 

Propostti0n 4. If ¢:,2 < 1, then {TS} dominates {T.”}. 

Proor. For any a we have 


Bo (a| 0) = Po K® < v) by (vi) 
= P(K®2 <v, KY <v) + PA(K® < v, Ky” 2 v) 
PAKS <v) + PK < K®) 
= BS (a| 6) + Po(K® < K%) by (vi). 


Since KS” /K2 — ¢ in probability (ef. (10)), and since ¢ < 1, P(KY < 
KS») +0asn— @. It follows hence from (x), as desired, that (ix) is satisfied. 

It follows from propositions 3 and 4 that g:» < 1 if and only if sequence 2 
dominates sequence 1 but sequence 1 does not dominate sequence 2. It also fol- 
lows that ¢ = 1 if and only if (a) each sequence dominates the other, or (b) 
neither sequence dominates the other. It can be shown by simple examples that 
contingency (b) does occur, i.e. in the general case, domination induces only a 
partial ordering of the class of all standard sequences. 





STOCHASTIC COMPARISON OF TESTS 293 


Appendix 2. In this appendix we discuss y¥, the limit of ¢ as @ tends to a null 
value, in a special context. Suppose that @ is an interval on the real line, and that 
H:@ = 6, where % is a point in 2. Let {U,} be a sequence of statistics on S 
such that the following conditions are satisfied for each @ in 2. (A) u,(@) = 
E(U, | 6) and o3(@) = Var (U,| 6) exist, 0 < o. < «; (B) V,(s, 6) = 
[U, — n(0))/c,(@) is asymptotically normally distributed with zero mean and 
unit variance; (C) with A,(0) = (un(@) — tn(O0)}/on( Oo), littno An(@)/n' = 
b(@)(say), where b ¥ 0 for @ ¥ 6 ; (D) {on(@)/on(O))/n' +0 asn— «; and 
(E) there exist an even positive integer k and a positive constant such that 
[b(0))) = A-(@ — O)* [1 + o(1)] as 6 —> &. 

Suppose that {US”} and {U{?} are two sequences satisfying conditions (A)- 
(BE), and let AS” (@), b:(@), k; , and A; be the corresponding parametric functions 
and constants, i = 1, 2. Define TS" = | V‘"(s, 6) |. It is then readily seen from 
(A)-(D) that {7%} is a standard sequence with slope (b,(@)]’. Hence gq. = 
(b,/b.}*. It now follows from (E) that 


0 if k, > ke 
(xi) Vi,2( 4) = lim ¢1,2( 8) = i/A2 if ky = ke 
99% x if ky < ky ° 


It follows from (C) that we also have 
(xii) ¥1,2(60) = lim lim [45 (0)/A,”(6)/. 
6-69 n2e 


’ The right side of (xii) is closely related to Pitman’s formula for the relative limit- 
ing efficiency, and becomes identical with the latter under certain additional con- 
ditions. Suppose, for example, that kj = k, = 2, that AS” isa continuously 
differentiable function of 6, dA’ /d@ = A‘'’(@) say, and that condition (C) is 
satisfied uniformly in a neighbourhood of % by both sequences. In this case, by 
first interchanging the order of the two limits in (xii), and then using the differ- 
entiability conditions, we obtain 


(xiii) ¥i2(00) = lim [AS (65) /A2’ (6)}. 


Suppose next that a and 8 are constants, 0 < a < 1 — # < 1, and {6.} isa 
sequence in 2 such that 


(xiv) lim [AS (0,)/AS (0)} = 1, lim BS (a | 0.) = Bo, = 1, 2). 


It then follows from (xiii) and the first part of (xiv) that 
(xv) ¥i2(00) = lim [AL (6,)/AL(0,) 


Since the right side of (xv) is Pitman’s formula, we see from the second part of 
(xiv) that y is the asymptotic efficiency of sequence 1 relative to 2 in Pitman’s 
sense. As far as calculation of ¥ is concerned, however, (xii), (xiii) or (xv) are 
not required since y is already given by (xi). 





294 R. R. BAHADUR 


Appendix 3. Under certain conditions the slope of a standard sequence {77} 
can be expressed as the limit, as n — ©, of n times the expected ratio of the 
power of the test based on 7’, to its size, with the size chosen at random accord- 
ing to a certain fixed distribution. This representation of a slope seems to be of 
some interest, partly because slopes are considered in Sections 2 and 3 of the 
paper without reference to testing at a preassigned level. 

Suppose, for example, that 7’, is a sequence such that in the null case 7’, is 
asymptotically normally distributed with zero mean and unit variance (condi- 
tions I and II, with a = 1), and that in the non-null case 


(xvi) lim E£ Ee = o| = 0, 

ne Vn 
where b is the parametric function specified in Condition III. For any given u > 0, 
consider the following test: reject H if and only if | T,| 2 u. Let @ be the (ap- 
proximate) size and 7, the power (y = 1 — 8) of this test, and let p, = y,/a, i.e. 


a(u) = 2 [ae at va(u}0) = P(|T.| > u|9), 
(xvii) r if : 
pa(u|@) = ml ; 


Let U be a random variable taking values in (0, ~ ) according to 


(xviii) P(U su) = {- aot [ ee ae) dt 


We then have 
(xix) c(8) = lim ~ Blp,(U | 6)) 


for every non-null @. It follows from (xix), in particular, that in the non-null case 
K,,/E(p.) — 1 in probability. 

To verify (xix), we note that c = ab” = b’, and that b’ is the limit of n'E(T%), 
by (xvi). Since E[T%, | 6] = ff P(T% = t| @) dt, it follows that 


n~n 


(xx) (8) = tim} [° P(| T.| = Vi| 9) dt. 


The desired conclusion follows from (xvii) and (xviii) by a change of variable 
in the integral on the right side of (xx). 

The formula for c obtained above is perhaps the simplest one in a class of such 
formulae. To obtain another member of the class, we note from (xvi) and b > 0 
that b is the limit of n£(| T, |). It follows hence that 


(xxi) c(0) = 2 tim | E%p,(V | 6)), 
FT nwo 1 





STOCHASTIC COMPARISON OF TESTS 


where V is distributed in (0, © ) according to 


(xxii) P(V Ss») = Lf eae) at 


REFERENCES 
[1] Banapur, R. R., “Simultaneous comparison of the sign and optimum tests of a normal 
mean,’’ Hotelling Festschrift, Stanford University Press, Stanford, Calif., 1960, 
to be published. 
{2} ANpprson, T. W. anp GoopMan, Leo A., “Statistical inference about Markov Chains,” 
Ann. Math. Stat., Vol. 28 (1957), pp. 89-109. 
[3] Prrman, E. J. G., Lecture notes on nonparametric inference, Columbia University, New 
York, 1949, Unpublished. 
[4] Noerner, Gorrrriep E., “‘A theorem of Pitman,’’ Ann. Math. Stat., Vol. 26 (1955), 
pp. 64-68. 
(5) Hoerrpine, WasstLy, aND Rosensiatr, Joan Ravp, ‘The efficiency of tests,’’ Ann. 
Math. Stat., Vol. 26 (1955), pp. 52-63. 
(6) Cuprnorr, Herman, ‘“‘A measure of asymptotic efficiency for tests of a hypothesis 
based on the sum of observations,’’ Ann. Math. Stat., Vol. 23 (1952), pp. 493-607. 
[7] Hopegs, J. L. ann Lenmann, E. L., “The efficiency of some nonparametric com- 
petitors of the ¢t test,’’ Ann. Math. Stat., Vol. 27 (1956), pp. 324-335. 
{8} Banapur, R. R., ‘On the asymptotic efficiency of tests and estimates,’’ Sankhyd, 1960, 
To appear. 
{9} Fe.iter, W., An Introduction to Probability Theory and Its Applications, Vol. I, 2nd ed., 
John Wiley and Sons, New York, 1957. 
{10} WaLp, A. anp Wotrowrrz, J., “On a test whether two samples are from the same popu- 
lation,’’ Ann. Math. Stat., Vol. 11 (1940), pp. 147-162. 
{11} Krusxat, Wittiam K., “A nonparametric test for the several sample problem,” 
Ann. Math. Stat., Vol. 23 (1952), pp. 525-540. 
{12} Kenna, M. G., Rank Correlation Methods, 2nd ed., Charles Griffin and Co., London, 
1955. 
{13] Cocnran, Wiiu1aM G., “‘The x? test of goodness of fit,’’ Ann. Math. Stat., Vol. 23 (1952), 
pp. 315-345. 





SLIPPAGE PROBLEMS' 


By Samuet Karin AND DoNnaLp Truax 
Stanford University 


1, Introduction. Slippage problems have been considered in the literature by 
Mosteller [6], Paulson [8], Truax [11], Doornbos and Prins [2], Kudo [5], and 
others. Roughly, the problem is as follows: We wish to compare n populations 
which have density functions f(x, 6), f(z, 6), --- , f(z, @,). On the basis of a 
sample from each population we want to decide if all the @; are equal, or, if not, 
which is the largest. Actually, a more restricted problem is considered in this 
paper, in which either all parameter values are equal, or all but one are equal 
and the exceptional one is larger. If the ith one is larger we will say it has slipped 
to the right. These slippage problems have certain similarities with the problem 
of ranking means considered by Bechhofer and others [1], but differ in that the 
latter deal mostly with procedures guaranteeing with prescribed probability 
the selection of the population with the largest parameter, where it is known in 
advance that one parameter exceeds all the others. These authors never allow 
the possibility that all parameters of the various populations are equal, which, 
in our situation, is called hypothesis zero. Other contrasts between the two 
problems will become apparent in our later discussions. 

A slightly different problem can be formulated in which we have in addition 
a control population. The problem is then to compare the n populations with 
the control, and decide if all the parameters are equal to the parameter of the 
control population, or, if not, which of the n populations has the larger parameter. 
In order to cbtain optimal solutions to the slippage problems, certain invariance 
restrictions will be imposed. Notice the obvious symmetry that states, if X, , 
X:, °°: , X, is observed (X; is an observation from the ith population) and if 
action j is appropriate (i.e., the jth parameter has slipped to the right) then if 
a permutation X,,, X.2,--:: , X.. is observed, action xj is appropriate. This 
suggests restricting attention to symmetric procedures. That is, if ¢:(X,, 
X:, ++: , X,) denotes the probability of taking action i when X,, X:,--- , X, 
is observed, then we will require g,:( Xa, Xe2,-°°* , Xen) = ¢i(X1,X2,°-* , Xn) 
for all permutations (1, 2, --- ,m) —> (41, #2, --- , en). We will further restrict 
attention exclusively to those problems in which it is possible to reduce the 
problem, by invariance, to a one parameter problem. In particular we will 
investigate several cases where the parameter is a translation or scale parameter. 

The nature of the Bayes solutions will be examined for these problems. The 
Bayes solutions are usually fairly easy to characterize, and many problems lead 
us to complete classes of solutions. We will show that any symmetric Bayes solu- 


Received June 16, 1959. 


1 This work was supported in part by an Office of Naval Research contract at Stanford 
University. 


296 





SLIPPAGE PROBLEMS 297 


tion, which usually is Bayes against a symmetric distribution, can be explicitly 
evaluated. 

One can conceive of more general problems than those we will discuss here. 
For example, we have considered only slippage, in the case of a one dimensional 
parameter, to the right. A more general problem would be that in which the 
direction of the slippage is not specified. Also, one might consider the problem in 
which a subset of the parameters has slipped, and we are to decide which subset 
it is, etc. Modifications of our arguments would apply to these more general 
problems. 

In Section 2 we introduce the pertinent definitions and terminology. In Sec- 
tion 3 some preliminary lemmas are proved and the Bayes solutions are character- 
ized in general form. In the following section the theory is applied to several 
examples, including the slippage problem of the means of normal populations 
with common variance and the slippage problem) of the parameter of a Gamma 
family of distributions. Part of this discussion deals with known examples in a 
more direct manner, while other examples are new. 

In Section 5 we study the slippage problem for populations having an un- 
known translation parameter. The problem is set up in a non-parametric form. 
The solutions of the slippage problems of normal variables and exponential 
variables are obtained by applying the theory of the translation parameter slip- 
page problem in the case of the existence of a sufficient statistic. In a similar 
manner, the symmetric invariant Bayes solutions are explicitly determined in 
the case of slippage of a scale parameter possessing a sufficient statistic. Mixed 
translation and scale parameter problems are discussed in the following section. 

In Section 8 it is shown under fairly general conditions that the symmetric 
Bayes procedures, characterized in the earlier sections, are uniformly most 
powerful amongst all symmetric procedures having the same error of rejecting 
hypothesis zero, when it is true. 

In Section 9 we discuss a multivariate slippage problem. Two slippage prob- 
lems for non-parametric situations are introduced in Section 10. Some decision 
procedures based on rank tests are proposed for their solution. In the last section 
a few remarks are offered about computing the critical numbers defining the 
symmetric Bayes solutions. 


2. Preliminaries and definitions. The slippage problem can be formulated in 
the following decision theoretic way. We observe an n-dimensional random 
variable X = (X,, X2,--+, X,) distributed according to a density function 
P(t1, T2,°**, In; Oy, O,-*-, 6.) which is known except for the parameter 
point 6 = (6, 6, +--+, 6.) where the 6; are real numbers. We assume the fol- 
lowing symmetry of the density: 


P(e,» Ler, °** » Zen 5 Oe1, Onn, *** 5 Den) = p(X, %2,°**, Ini, Oo, -+- , On) 


for all permutations (1, 2, --- ,n) —> (wl, #2, --- , en). There are n + 1 avail- 
able actions which we will call ap, a, --- , a, , and the loss in taking action a, 





298 SAMUEL KARLIN AND DONALD TRUAX 


when @ is the true parameter point is L;(@). The loss functions are assumed 
to have the following properties. 

(1) Lej(Oe1, O12, °°* , Orn) = Lj(Q, O62,°°-, 6.) for all permutations 
(1, 2, +--+, m) — (al, #2, ---, wn) and for all 7. (We may include the case 
j = 0 by defining 70 = 0.) 

(2) L;(@) < L,(@) for all i # 7 if 6; = w + A for some real w and A > 0, 


and 6; = w for all t # j. Lo(@) < min L,(@) if 6; = w for some real a, 
lsisn 
i = 1,2,---,n. Otherwise, Lo(@) = L,(0) = --- = L, (6). 


For the problem in which there is a control population, a modification of the 
above is needed. In this case (X,, X2, --- , X,, Y) is observed according to a 
density 


P( 21, Z2,°** yn, V3 H,***, On, 8), 


which satisfies 
P( Ler, Ter, *** » Leny Y; Oe, Ova, *** » Oen, 8) 
= p(X, 2, °°*, In, Y3H,°** , On, O) 


for all permutations r:(1, 2, --- ,m) — (wl, ---, wn). The loss functions satisfy 

(1’) Le On, ++, On, 0) = LAGH,-++, O, 6) for all permutations 
wi(1, 2, +--+ ,n) — (wl, 2, --- , wn) and forall 7. (We include j = 0 by defining 
nr) = 0.) 

(2’) Ly(, +--+, On,0) < Lil, ---, 6,, 0) for allt = 7 if 0; = 6 + A for 
some A > O and 6; = @ for all i # 7. Lo( A, -+- ,O,, 0) < Lili, +--+, Om, 6) 
(1 sis n) if 6 = 6 for all «. Otherwise, Lo(6,, ---, 6., 0) = -*: = 
L.(@,,°°° , O, 9). 

DEFINITION 2.1: A symmetric decision function is a vector function 
¢ = (go, ¢1,°°*, ¢n) on R” with 


(i) 0 S oz) 31, 

(ii) Dioe(r) = 1, 

(iii) @ej(2e1, Ler, ***, Len) = (ti, %2,°**, Xn) for all permutations 
(1, 2, ---,m) — (l, #2, --- , en) and all 7 (including j = 0, with r0 = 0). 


3. Bayes solutions. 
DerinitIon 3.1: If & denotes the set of all possible decision functions, 0 
the parameter space, the risk function p is a function defined on ®@ X © by 


p(y, 0) = [ XY Letzrp(a, 6) ae 


where zt = (2%, °*:,2n), 0 = (,-°--*, 6,), and dz denotes ordinary Lebesgue 
measure on R". 

Lemma 3.1: The risk function of a symmetric decision function ¢ is a symmetric 
function of 8. 





SLIPPAGE PROBLEMS 299 


Proor: Let (1, 2,---, n) — (wl, #2,---, an) be a permutation and let 
x0 = 0. Then 


/ 2» L,(0)¢;(2) p(2; 6) dz = / 2d Lyi( Oe )Gei( te) p( te ; 6.) dz, = p(y, 6), 


where 6, = (Om, Or, sted en). 


DeFtnitIon 3.2: A decision function ¢’ is said to be Bayes against a distribu- 
tion F if 


| o(e*,0) arco) = min | o(e,0) arco). 


THEOREM 3.1: Any symmetric Bayes solution is Bayes against a symmetric 
distribution. 

Proor: Let ¢ be symmetric. Then by Lemma 3.1, p(y, @) = p(¢, 6.) for all 
permutations (1,2, --- ,n) — (wl, #2, --- , wn). If gis Bayes against a distribu- 
tion F, ¢ minimizes 


[ o(e,0) ar (0) = | o(e,6.) aF(0,) = f oly, 6) aF(6,). 


Define the distribution function F*(@) by F*(@) = i/nt>; F(@,) where the 
sum is taken over all permutations. F* is symmetric and ¢ is Bayes against F* 
since fp(y, 6) dF(@) = Spl, 0) dF*(@). 

It is well known ([4], p. 279) that the Bayes solutions have the form 


o(z) = lif / L,(6)p(2; 6) dF(@) < [ Lco)p(z;0) dF(@) for all i x j. 
In computing the Bayes procedures we examine expressions of the type 
[ .o) — L,0)Ip(a; 6) arco) 


for changes in sign. Since L,(@) — L,;(@) = 0 for all those points @ not of the 
form 6 = w for all k except possibly a single value, where 6; = w + A, A 2 0, 
we may restrict attention to those distributions F whose spectrum is contained 
in this set of points. If we denote this set by @ we can identify the points of Q 
with {(i, w, 4)} by the correspondence (6, , 6.,--- , 4.) <> (4, w, A) if @; = w 
for some real w for indices j with the exception of i where 6; = w + A, A 2 0. 

Let F denote a distribution function on @. Then define & as the probability 
(under F) that A = 0. Define £; as the conditional probability when the ex- 
ceptional index is i given A > 0. Let Fo(w) be the conditional distribution of 
w given A = 0, and F;(w, A) the conditional distribution of (w, 4) given the 
exceptional index i where A > 0. Note that F is symmetric if and 
only fh =~ & =---=&,Fi = FPF, =--- = F,. 

We now proceed to examine the Bayes procedures. In order to facilitate 
the study we state some lemmas concerning the loss functions on 2. 





300 SAMUEL KARLIN AND DONALD TRUAX 


? 


Lemma 3.2: If i ¥ j,i * k, j,k > O, then L;(i, w, A) — Lili, w, A) = 0.2 
( This includes the case A = 0.) 

Proor: Consider any permutation (1, 2, --- ,) — (wl, #2, --- , mn), such 
that wi = i, 77 = k, rk = j. Then 


L,(i, w, A) — Lyi, w, A) = Ly j( wi, w, A) — Lex (ai, w, A) 
= [,(1, w, A) — L;(t, w, A) 


Lemma 3.3: If i # j, k # l, then Li(j, w, A) = Ly (l, @, A). 

Proor: Consider any permutation (1, 2, ---,) — (wl, #2, ---, mn) such 
that ri = k, xj = 1. Then Li(j, w, A) = Lei(aj, w, A) = Ly(l, w, A). 

If we let L;(w) denote the loss function when action 7 is taken and A = 0, 
and let p;(x; w, 4) denote the density when the parameter is (j, w, A), then the 
computation of the Bayes solution reduces to consideration of the expressions 


/ > [Li(k,w, 4) — L,(k, w, A)\pe(x; w, A) dF(k, w, A) 


k=O 


‘ ty | (Lu) — L,(w)\pe(x; w) dF ow) 


rw / [L,(k, «0, 8) — L,(k, o, d)ipe(z; «, A) d,(w, A). 
k=l 


When we are seeking only symmetric Bayes procedures, we may, by Theorem 
3.1, take & = eee = En = gE, F, = Fy, = +s = F,, = F. 

A detailed study of the Bayes procedures will require an extension of the 
notion of densities possessing a monotone likelihood ratio. The usual idea of a 
monotone likelihood ratio is given below. 

Derinition 3.3: A function f(z, y) defined on R’ is said to have a monotone 
likelihood ratio if for any choice 7, < 22, y: < ye, we have det | f(z; , y;) || = 0. 

The fundamental property possessed by these functions is summarized in 
the following lemma. A proof may be found in [4]. 

Lemma 3.4: If h(y) is a function which changes sign at most once from positive 
to negative values, and f(x, y) has a monotone likelihood ratio, then 


ata) [rsa du(y) 


changes sign at most once in the same direction as h. Here, wp denotes any positive 
a-finile measure. 

We now generalize the concept of a monotone likelihood ratio for joint densities 
of n variables that depend on n parameters. (See also Pratt [9]. ) 

DerinitTion 3.4: Let A and B be arbitrary sets. For each ae A, Se B, let 
te(t) = (¢(t), e@(t),---, cQ?(t)) and 6(s) = (63°(s), 05°(s8),-°-, 


? The notation L; (i, w, A) means that we are evaluating L; at the parameter points in 
the set {i, w, A}. 





SLIPPAGE PROBLEMS 301 


63"'(s)) be curves in R". A density f(z, 22, °-+ , Zn j 1, 62, °°, On) is said 
to have a monotone likelihood ratio with respect to the family of pairs of curves 
\(ta(t), 09(8)); ae A, Be BS if for every ae A, Be B, f(za(t); O(8)) has a 
monotone likelihood ratio in the variables ¢ and s. 

Most of our discussion of Bayes procedures will be based on the following 
assumption on the densities. 

(A) Forj > 0,k > 0, 


Pi(X1 , Ta, °°» Fn 5W, A) & Pol, Xe, °** » Xn 5, B) 


if and only if z; 2 2 , or strict inequality in both instances. 

The following theorem provides one condition which implies the validity of 
assumption (A). 

THEOREM 3.2: Let y(s) and 6(t) be continuous strictly increasing functions de- 
fined for ali real s and t, y(0) = 8(0) = 0, and such that y(s) and 6(t) range from 
—x to +«. Define a family of pairs of curves with the following properties: the 
curves x (t), 6(s) belong to the family if and only if for some j, k and real numbers 
a and b, 

x(t) = a+ 46(t), u(t) =a + 6-0), 
x(t) constant for i # j,i # k; 

6,(s) = b + y(s), O(s) = b + y(—8), 
6,(s) constant for i # j,i # k. If the density p(x, , to, +++ , tn 501,02, °** ,0,) hasa 
strict monotone likelihood ratio with respect to this family of pairs of curves, then 
assumption (A) ts satisfied. 

REMARK: Our main applications involve the curves 6(t) = t and y(s) = s. 

Proor: Suppose xz; 2 2. We can find a pair of curves (z(t), @(s)) in the 
family so that for some ts 2 0, x = z(t) where z; = a + A(lo), % = a + 8( —by) 
and such that for some s» > 0, 6(s) corresponds to the density p,;(z; w, 4) 
and @(—s») corresponds to the density p,(z; w, 4). Then [p(x(t); (4s) ))/ 
[p(x(t); @(—8))| is a monotone strictly increasing function of ¢. But at ¢ = 0 
this ratio is one, since if x is the permutation which interchanges j and k and 
leaves all others fixed, 


p(x(0); 0(—s8)) = p(z,(0);6,(—8)) = p(x(O); O(8)). 


Hence, the above ratio is 21 if t 2 0, in particular fort = &. 

Conversely, if p;(z;, 4) 2 px(x; w, 4) we can choose the curve 6(s) described 
above and an arbitrary curve z(t) such that z,(t) = @ + 4(t), a(t) = a + 6(—?) 
for some real number a and arbitrary (but fixed) coordinate z; when i # j,i # k. 
Then, as before, 


p(x(t); 6(%) ) 
p(x(t);0(—%)) ~ 


implies that 4 2 0, which in turn implies z,;(t) 2 z(t). Since this holds for all 
curves z(t) of this form, the theorem is proved. 
THEOREM 3.3: If p(x; @) satisfies assumption (A), then any symmetric Bayes 





302 SAMUEL KARLIN AND DONALD TRUAX 


procedure for the slippage problem has the form g(x) = 1 if xe Ry, a symmetric 
set, and g(x) = lif xe Ry, and xz; > 2; forallj #i,i = 1,2,---,n fre Ry 
and MAXisicn Tj = Ti, = Lig = -** = Xi, then oi,(z) + Gi,(z) + --- 
+ ¢i,(z) = 1. 

Proor: If ¢ is Bayes against an a priori symmetric distribution F, then 
¢o( x) = Lif 


& / [Lo(w) - Lj(w))\po( x; w) dF ,(w) 


+¢é > / [Lo(k, w, 4) — Lj(k, w, A)|- m2; w, A) dF(w, A) < 0 
k=l 
for 7 = 1, 2,---, mn. The symmetry assumption on p and L clearly show that 
this set is symmetric. Call this set Ry . 
If i # 0, ¢(x) = 1 provided 


£ / [Li(w) — L,(w)|po(2; w) dF o(w) 


+ ED | (i(k,o,a) — L,(k,«, &)\ps(2; w, 4) dF(w,d) <0 
k=l 


forj = 0,1, ---,n,j # 7. If 7 = 0, the above inequality says that z z Ry. 
If 7 > 0,7 # t, by Lemma3.2, L;(w) — Lj(w) = 0, and Li(k, w, A) — L,(k, w, A) = 
0 ifi # k,j # k. Thus, the above inequality reduces to 


gf WLsli,o,4) — L;(i, w, d)lp:(x; 0,4) dF(w, A) 


+¢ [ (j,w,a) — L(j,«, d)|p)(2; 0, &) dF(w,d) <0. 


The symmetry of the loss function shows that L,(t, w, A) — L;(i, w, A) = 
—(Li(j, w, A) — L;(j, w, 4)| so that the inequality becomes 


 f Lali, «, 4) — L,(é,«,A)iips(2;0,4) — p(2;0,4)) dP(e, 4) <0. 


Since L,(i, w, A) — L;(t, w, A) < 0, assumption (A) says that this inequality 
holds for z; > z;. Thus, g;(7) = 1 if e¢ Ro and z; = maxi<j<nr; (except pos- 
sibly ¢;(2) < 1 on the boundary). 

Theorem 3.3 gives the general form of the Bayes procedures, but does not 
make explicit the nature of the set Ry (the set where Ho will be accepted) aside 
from the fact that Ry is symmetric. The character of this set depends strongly 
on the form of the density function. In the next section we will consider several 
specific densities of importance in their own right, and the slippage problems 
connected with them. Some of these examples have been studied previously 
in the literature, while the others are new. All of them are easy examples of our 
general unified approach. 





SLIPPAGE PROBLEMS 303 


4. Specific examples. In this section we will examine the form of the Bayes 
solutions to specific slippage problems. The discussion of these cases are written 
in detail to exemplify the method of analysis. In most of the situations we will 
consider two cases which we will label the uncontrolled case and the controlled 
case. The controlled case refers to the situation where in addition to an observa- 
tion from each of the n densities, we have an observation from a density whose 
parameter is known not to have slipped. 

For reasons of symmetry, the controlled case will be easier to treat, and we 
will usually examine that case in detail and merely state the result in the uncon- 
trolled case. Moreover, whenever the problem possesses a natural invariance 
structure with respect to a group of transformations, we will then automatically 
restrict ourselves exclusively to those procedures invariant under the induced 
group of transformations acting on the decision space. 

4.1. The normal density with known variance. 

(a) The controlled case. Suppose we have independent observations 
Xi, Xe, +--+, Xn, Y with X; (each X; and Y usually represent a sufficient 
statistic, the average sample values, based on several observations) having a 
normal distribution with mean @; and variance 1, and Y having a normal dis- 
tribution with mean @ and variance 1. In addition to the assumptions made on 
the loss function in Section 2 we will assume that for every real number c, 


LO +6, +--+, 0, #0,0+¢) = L(0,,-+*,O,, 9), + = 0, | ee 


Since the problem is invariant under translation of each observation by a fixed 
amount it is reasonable to look only at those procedures which depend upon 


U, = X%, - Y,U; = X: —- Y,-°°,U0, = X, — Y. 


(The statistic U represents a maximal invariant under the translation group.) 
For a discussion of invariance we refer to ({12], Chaps. 6-7). 
The joint density of U,, Uz, +--+ , U, is given by 


P(My, °°, Un5 1, ++, wa) = C exp[—} 0 2 d"(uy — w)(u; — w,)], 


tl j= 


where C is a constant independent of the w;(w; = 6; — @), and 


Assumption (A) can be verified by showing that the density has a monotone 
likelihood ratio along the curves u(t), w(s) defined by 


um=a+t+t, wm =a—t, all other u, fixed; 

wo =b+8, w = b— 8, all other w, fixed. 
In fact, 

p(u(t); w(s)) = fit)g(s)e"" «>0 





304 SAMUEL KARLIN AND DONALD TRUAX 


which clearly has a monotone likelihood ratio. Thus, the form of the Bayes 
solutions are as described in Theorem 3.3. Only the set Ry must be characterized. 
In order to characterize Ry we will make an additional assumption concerning 
the losses, namely, 


(B) Lo(k, w, A) = Lj(k, w, A) for allj # kand A + 0. 


In the presen‘ problem the real parameter w reduces to the single value 0, 
on account of the translation invariance; and we may write L,(k, w, 4) simply 
as L;(k, A). 

Ro is an intersection of sets of the form 


[x [Lo(k, 4) — L,(k, 4) \p(u; k, A) dF(k, A) < 0. 


k=) 


Using the notation of Section 3 this can be written 
Eo( Ly — Lj) po(u) + & | [Lo(k, 4) — L,(k, &)\p,(u; A) dG(A) < 0. 
kel 40+ 


(the first term Lo — L,; corresponds to the parameter point A = 0). Under 
assumption (B) this reduces to 


Eo( Lo — Lj) po(u) + ef [Lo(j, 4) — L(y, A) \p;(u; A) dG(A) < 0. 
0+ 


Using the fact that p,;(u; 4) = po(u) exp [—4”A’ + Ah;(u)], where 


hj(u) = nu; + > A*u; =Uj- = i, 
= n+ 1 


i=] 


the inequality becomes 
po(u) £)( Lo — L;) + E / 
O+ 


-[Lo(j, 4) — L,(j, A) exp [— 347A" + Ahj(u)] dG(A) < 0. 


We see (by virtue of condition 2’ of Section 2) that the quantity in brackets 
is a monotone function of h;(u), so the inequality is equivalent to h;(u) < c, 
or u; — (n/(n + 1))a < ec. Thus, every Bayes solution has the form 


, n " 
g(u) = 1 if max (u —- — 9) <€<¢, 
lsign n 1 


¢gi(u) =1 if max (u ald Medea 2) > cand u; > u,; for all j # 7. 


lsign n+ 1 

* This assumption is reasonable in many situations and leads to a tractable explicit solu- 
tion to the problem. Nonetheless our method can be employed in the general case without 
this assumption, but then the solution is only expressible implicitly. 





SLIPPAGE PROBLEMS 


In terms of the original observations, the procedure is of the form 


Go(%i,°** ,In,y¥) = Lif max (2, - 2+ ¥) <é, 
lsign n+1 


, 2) > cand z, > z; for all j # i. 


(ai, °**,2n,¥) = 1 if max (« - ——— 
— 7 isisn \ n+1 
(b) Uncontrolled case. ( Paulson [8].) This example was first treated by Paulson 
and now emerges as a special case of our theory. For the uncontrolled problem 
we assume that, for all real numbers c, 


L( 0, + ¢, +++, 0, + €) = Li((,-++ , On), 


and also condition (B) of part (a). Restricting attention to invariant procedures 
leads to considering decision functions on the variables U, = X, — X, U, = 
X, — X,--- ,U, = X, — X where X¥ = (1/n) 02. X;. The analysis proceeds 
in a manner similar to case (a) above, and actually factors in simpler terms. 
Every symmetric Bayes procedure for the slippage problem in terms of the u, 
has the form 


Go(%,°**,Un) = 1 if maxu; <c, 


1ssGn 
gi(M , *** , Un) 1 if max u,; > cand u; > u; for allj # i. 
Isjagn 
In terms of the original observations 
go(t1,°**,%n) = 1 if max (z; — 2) <c, 


Isign 
¢i(21,°** » Zn) 1 if max (xz; — 2) > cand z; > z; forallj # i. 
lsign 

4.2. The normal density with unknown variance. 

(a) Controlled case. We have independent observations X,;, X2;,--- , Xa;, 
Y;,j = 1, 2,---, k, where the X,; are normally distributed with unknown 
mean 6, and unknown common variance o’, and the Y, are normally distributed 
with unknown mean @ and unknown variance o’. For convenience of exposition 
we take k; = k, and for reasons of invariance we assume that, for all real numbers 
a and real 8 > 0, the loss functions satisfy 

A +a 6,+a 0+ 4) 
L, ae 7) = Li, --- ,6,8,0). 
( 8 ,- 8 ae ; 
For reasons of invariance it is reasonable to consider only those procedures which 
depend on 


U; = (X, . Y)/S, o. U, = (X, = Y)/8, 
where ft 


- k k 
X,=>Xu/k, YexDYy,/k, S= Lo (X, — X,)? + 2 (Y, - Y)’. 
j=l j=l j= 


il j= 





306 SAMUEL KARLIN AND DONALD TRUAX 


The U; constitute a maximal invariant with respect to the affine group of 
transformations of the real line into itself, under which the problem is invariant. 
The joint density of U, , Us, --- , U, can be written as 


p(u;n) = ef exp| - ; yD aus - nm) (uj 8 — 9) — 2 


(n+1) (k—-1 1 
“3” ™ ds, 


aii ise 
\ = ea 1) and 7; = (6; —@)/c. 
—- —— if ij 
(n + 1) 


For this problem let us verify assumption (A) directly. If p;(u; 6) denotes the 
density when H; is true and 6 = A/o, we may write 


pi(u; 5) = C-f(6)g(u) [ exp [6Z; — 30°"? *"?* dt, 
e 
where 


nk a as (n+h) (k—1) 
18) = exp| -; nk |, hi) i/\(E ONY uy + :) : | 
n+ 1 i=l j=l 


and 


n n n ; 
Z; = (> a ee sei) | (= 2 Nu; uy + 1) . 
1% t=1 j= 


j=l 


Thus, 
pi(u; 5) — pj(u; 6) = C-f(d)g(u) [ Come wuld that abdadetiadiel 
0 
so that p;(u;6) — p;(u;6) 2 Oif and only if Z; = Z; and it is easy to establish 
that the latter is equivalent to u; 2 u;. Hence, assumption (A) holds and the 
form of the Bayes solutions is determined except for the set Rp . 


To represent the set Ry , we again postulate that (B) is valid for the problem 
in terms of the U;. Then, Ro is an intersection of sets of the form 


Eo(Lo — Lj)po(u) + & [. [Lo(j, 5) — L;(j, 8))p;(u; 6) dF(6) < 0. 
Inserting the explicit expressions of p; the inequality becomes 
g(u) [ / {eLo(j, 8) — Lj(9, 8))f(8)e"™* — &o(L; — Lo)} 

+ %@ 


se tty at a1) dt dF (6) < 0. 





SLIPPAGE PROBLEMS 307 


Since the integral (by virtue of condition 2’ of Section 2) is a strictly increasing 
function of Z; , the inequality is wee to Z; < ¢, 


u- 


: n + i 
f n (S u,)? 
k i- 
! + Lae _ ae re 
Thus, every Bayes procedure for the problem in the variables u; has the form 


n 


uj — 
g(u) = 1 if max —— 


7 en gy <C 
my +k So pwd 

n 
n+i° 


max = — -——eu-—Gay > € 
Isign ~~ wT 
[+e pat — ew 
and u,>u; forall j # i. 


In terms of the original observations a slight calculation shows that go( X,,, Y;) 
1 if 


uj- 
g;(u) = ] if 





S47 
z,-= 


max n+1- 


eee D(R- r ig Ko + ey) 


¢i(Xi;, Y;) = 1 if the above max is >c and X,; > X;, for all j # i. 

(b) Uncontrolled case ones with Paulson (8). Here we have independent 
observations X;;, i = 1, 2,---,n;j = 1, 2, , k, where X;; is normally 
distributed with unknown mean 6; and aiiadie common variance o’. Again, 
we assume that the losses are invariant if a constant is added to each 6, , and 
if each 6; and o are multiplied by the same positive constant. Then, invariance 
considerations tell us to look at procedures based on 


U, = (X; — X)/S,---,U, = (X, — X)/8 


where S? = 5°?., 5-5, (X,,; — X,)*. An analogous and simpler analysis as in 
(a) above shows that the symmetric Bayes procedures for the problem in terms 
of the variables U; are all of the form 


n i 
go(u) = 1 if max w / (kX ut +1) <¢, 


isise t—1 





fn ‘ 
¢(u) = 1 if max uw / (kX ut +1) >c and u>u, for j # i. 


lsign 





308 SAMUEL KARLIN AND DONALD TRUAX 


In terms of the original variables this becomes 


go(z)=1 if max (X, — XY (xX, - xy} <¢, 
lsisn 
g(x) = 1 if max (X;- DADS YS (Xi — DI <e, 


lsjsn 
and X,> X; forall j # i. 


4.3. The gamma distributions. 
(a) Controlled case. Suppose we obtain observations X,, X:,--:, X,, Y 
such that the X; and Y are independent and X; has density 


a i € 2/0; p-l 
I'(p) 6? 
and Y has density 
1 1 —y/@ pl 
I'(p) 
(Here p is a fixed positive parameter.) We assume that the losses under scale 
transformations satisfy the invariance condition 


Lj( ad; , +++ , a0, , a8) = L(0,,--- , 0 ; 8) 


for all a > 0. In addition, we will assume that condition (B) holds. We see 
clearly that the problem remains invariant under the transformation which 
multiplies each observation by the same positive real number a, and as usual 
we will consider only invariant procedures. That is, procedures depending only on 


U, = X,/Y,---,U, = X,/Y. 
The joint density of U,,--- , U, is 


p(u;w) = (c ow) /[(1 > 


where w, = 6,/6. Assumption (A) may be checked by showing that p(u; w) 
has a monotone likelihood ratio with respect to the curves 


u=a+t, u,;=a-—-t (08¢tsa), whileall other coordinates stay fixed; 
w=b+s, w)=b-—s (0882856), while all other coordinates stay fixed. 


This is readily established by direct calculation. To characterize the set R* 
we take the intersection of the sets defined by the inequalities 


tlle — L,)pr(u) + [ [Lo(j,8) — L,(j, 8)Ip;(u; 8) dF(8) < 0 
+ 


where 6 = A/6@.(L,o — L; denotes the difference of the loss functions, when 
taking action 0 and 7 where the true hypothesis is zero. ) 





SLIPPAGE PROBLEMS 


This inequality can be written as 


: 3 uy (n+l)p 
Po( u) [ {usa - L,(, Maz + /( $414 Sima] 


— &(L; — L»)} dF() < 0. 


The integral expression is clearly a strictly increasing function of u,/ 
(1 + 302, u;) and hence the inequality is equivalent to 


w/ (Sut) <e or uf (Sat) <e 


Thus, the symmetric Bayes solution has the form 


¢go(z,y) = 1 it max 2,/ (5 n+) <6, 
isis" 


¢i(z,y)=1 if max a / (Sx + v) >c and z,> 2, forall j # +t. 
sign =1 
(b) Uncontrolled case. A special case of this example was treated in [11]. 
The corresponding symmetric Bayes solutions in the uncontrolled case have 
the form 


¢o( 2) 1 if max z,/> lux <e, 
lsisn 


e(z) =1 if maxz,/>-Riz,>ec and zx, > 2; forall j ¥ i. 
Isjgn 


5. Translation parameter slippage problem. 

5.1. The general form of the invariant Bayes solutions. Assume that X, , X2, --- , 
X, are independently distributed according to the densities p(x — 6), ---, 
p(x — 6,) respectively, and suppose that Y is independent of (X,,--- , X,) 
and has density p(y — @). Here, the variable z and the parameter @ traverse 
the real line. It is possible to develop a corresponding theory in the case where 
xz and @ are integer valued. However, for the sake of exposition, we have limited 
our discussion to the case of continuous variables. The densities differ only 
in their location parameter. 

The assumptions on the losses are the same as in the preceding problems which 
dealt with the controlled case. In addition the losses will be assumed to satisfy 
the invariance property 


LO, + ¢,--+, 0, +6,0+¢c) = L(G, ---,O, 8) 
for all real numbers c. A maximal invariant for the problem is then U; = X, — 
¥,*--, Us. = xX, - Y. 


We will assume throughout this section that the density p(z — 6) has a 
monotone likelihood ratio (abbreviated M.L.R.). i.e., 





310 SAMUEL KARLIN AND DONALD TRUAX 


P(t — )p(tz — &) S pla: — b)p(m, — 4) 
whenever 


<2 and & < O. 


The class of distributions which possess a M.L.R. with respect to a translation 
parameter include all P.F.F.’s [4], any non-central x’, any non-central ¢, ete. 
Most distributions arising in statistical practice are of this kind. For convenience 
of exposition we shall assume henceforth that p(z) is strictly positive. All our 
discussion will remain valid if we merely take p(x) non-negative and positive on 
some interval. This involves a tedious consideration of cases with no essential 
new ideas. 
The joint density of U,,--- , U, is 


q(u;w) = [. T] plus — o% + t)p(t) dt, 
where w; = 6; — 6. To check assumption (A) we note that 
qj(u; 4) — q(u; A) = 1 I] p(u; + t) 


‘[(pluy — 4 + t)/plu; + t)) — (plu — & + t)/plum + t)))p(b dt. 


Since p(u — @) has a monotone likelihood ratio, then for every t, the quantity 
in brackets is greater than or equal to zero, if and only if u; 2 u . Thus, assump- 


tion (A) holds, and we know the form of the Bayes solutions except for the set 
R, . The set Ry is an intersection of the sets of « values satisfying the inequalities 


tlle — Ly olu) + 8 [. (Lo(j, 4) — Lj(j, A)]qj(u; A) dF(A) < 0, 


or 


(le — Ls) [UI r(w + Ope) ae + & [eG a) - 14, 4)] 


| Il p(u; + t)l(plu; — A + t)/p(u; + t))\p(t) dt dF(A) < 0. 


© im) 


If we interchange the order of integration and set 
&(u) = [. {eLo(j, 4) — L,(j, A)ip(u — A)/p(u))] — &o(L; — Lo)} dF (A) 
we may write the inequality as 
[ ©(u; + t) I] p(u; + t)p(t) dt < 0, 


where @(u) is monotone increasing. Now consider the curve 4, 
t = 1, 2, ---,m. For u on this curve the inequality becomes 





SLIPPAGE PROBLEMS 


[ @(X + oT] pir + } p(t) dt 
(1 “* wre ’ . (an 
= . ou) T] p(u)} p(u — r») du < 0. 


Now, #(u)[p(u)]" changes sign at most once, and p(u — ) has a monotone 
likelihood ratio. Hence, by Lemma 3.4, f[*« (u)p"(u)p(u — ) du changes sign 
at most once in \ (from negative to positive values). Let \» be the value at which 
it changes sign. If there is an interval of \ values where the integral is zero, 
then define A» to be the smallest value of this set. Let > = + © respectively if 
the integral is always negative or always positive. 

THEoreM 5.1: 


Ro C {u| max uy < Ad, 
lsign 


where \o is defined in the preceding paragraph. 

PRooF: 

Case I: \) = +. The theorem is trivially true in this case. 

Case II: \) = —«. This means that [*.(u)p"(u)p(u — ) du is positive 
for all real A. We must show that R, is the empty set. Suppose that u ¢ Ry and 
that u, is the maximum coordinate of u. Now, 


[. &(u, + t) I p(u; + t)p(t) dt 


= [. p(z)®(z) I plu; — um + z)p(z — uw) dz 


changes sign at most once in u, when the other variables are held fixed, since 
II? p(u; + z — m)p(z — uw) has a M.L.R. in the variables z and u, . Hence, 
there exists a vector point u’, having the same components as u, except for the 
first component, where u, is determined so that u; = MAaXe<j<n¥,;, and belongs 
to Ro. This is the case, since the above integral is negative for the point u, and 
remains negative when the first component ™ is decreased. Continuing in this 
fashion we may show that a point, having all coordinates equal, belongs to R,y 
which contradicts the assumption that 4» = —. 

Case III: Ao is finite. 

Consider a point uz {u | maxi<j<nu < Ao}. For definiteness suppose m4 = 
maxi <i<n¥;. Consider a curve T(u; = u,(s),j = 1,°--,), passing through 
the points (Ao, Ao, «-* , Ae) and u, with the property u,(s) — (4s) is decreasing 
fori = 2,3, ---, mn, and ~™(s) is increasing. Along this curve it is easily verified 
that the density [ ]?-1p[z — (m(s) — ui(s))|p(z — u(s)) has a monotone likeli- 
hood ratio in the variables z and s. Then 


[ eu() + t] I plu;(s) + thp(t) dt 


= [ a) II plz — (m(s) — u(s)}plz — u(s)) dz 
=i 





312 SAMUEL KARLIN AND DONALD TRUAX 


changes sign at most once. But since it changes at (Ay, Ao, -*~ , Ao) the point 
uz ho. 

In general, we cannot make more explicit the nature of the set Ry . However 
when there exists a one-dimensional sufficient statistic, a more precise char- 
acterization of Ry is possible. This is done in the following paragraph. 

5.2. Form of the Bayes solutions when there is a sufficient statistic. In this section, 
by way of variation, we shall discuss the uncontrolled problem and state without 
proof the corresponding conclusions in the case of the controlled problem. 

We suppose that p(x) is bounded, and we may assume for convenience that p 
has its maximum at z = 0. For, if it has a maximum at z = 2» we can relabel 
the parameters so that & = @ + 2. In addition we will suppose that there is a 
statistic T = T(2,,---+,2,) which is sufficient for 6 when 6 = @ = --- = 

= 6. That is, the likelihood function can be written 


I] p(z; — 0) = r(x)q(T; 8). 
Since the maximum likelihood estimate (M.L.E.), 6, is a fortiori sufficient we 
can take T = 6. The following lemma is now immediate. 

Lema 5.2.1: The maximum likelihood estimate of 0, 6, is translation invariant, 
i.e., O(a, + ¢,-++, an, +e) = O(a, ---, an) + ¢, and] ][fup(2z; — 8) can be 
written as r(x)q(6 — @). 

Lemma 5.2.2: q(6 — @) has a monotone likelihood ratio. 

Proor: We assume for simplicity of exposition that p is positive everywhere. 
The general situation can be handled by a tedious enumeration of cases. Let 
0, > O02, bs = O(a, +++, an), by = O(a, + 5,---, an + 5) = b& + 5, where 
5 > 0. Then 


(q(6, = 6,)/q(b, - 6)] _ (q(b. - 6) /q( bs — 62)) 
= | p(x; + 6 — &) I p(x, +5 — a) | 


iv | 1 p(x; — 4) Il plz — a) | > 0, 


since for each i, p(x; + 6 — &)/p(a; + 6 — 62) > pla; — 6)/p(2; — 6) because 
of the monotonicity of the likelihood ratio for p(z — @). 
TuHeoremM 5.2.1: For the uncontrolled problem the set Ro has the form 


Ry = {x| max (2; — 6) < e¢} 
lstsn 


where 6 is the M.L.E. of @ under Hy ; 0 = --- = 6, = 8. 

Proor: The maximal invariant has a density which can be written in the 
symmetric form f*.] [?-.p(2; + ¢) dt, and, as before, the set Ro can be expressed 
as an intersection of sets defined by the inequalities 


[ee +09 pa toa <o j=1,2,-+-,n 
— i=1 





SLIPPAGE PROBLEMS 313 


where #(u) is the same as in Paragraph 5.1. By assumption, @ is sufficient for 
6 so that 


} (2; +t) I pla; + t) dt = [. r(x)@(2; + t)q(d + t) at 


= r(z) [. &(u)g(u — (2; — 6)) du. 


Since @ is monotone increasing and g(u — (xz; — 6)) has a monotone likelihood 
ratio, the above is less than zero if z; — 6 < ¢ for some appropriate constant c. 
Since the above is to hold for j = 1, --- , n, the set Ry is determined as 


ja | max (x; — 6) < dj. 
lsign 


An identical result holds for the controlled problem. The formal arguments 
are similar. 

THEOREM 5.2.2: If the maximum likelihood estimate @ is sufficient, then the 
class of procedures of form 


g(t) = 1 if max (x; — 6) <e, 
lsign 


g(t) = 1 if max (2; — 6) > cand z, > x; for alli # j, 
lsign 
constitute a minimal essentially complete class of symmetric invariant procedures. 
Proor: Let Ag = 0, 4; , 42, --- be a dense set of points which includes all 
points of discontinuity of the function y(A4) — Le(i, 4) — Li, 4) which is 
clearly independent of i. Consider any symmetric invariant Bayes procedure 


(m) 


¢” which improves on ¢ at Ay, 4;, --- , An. That is, 
p(y, Ai) — ply”, As) 2 0 fori = 0,1, ---,m, 


where 
p(y, 0) — p(y”, 0) = — | (vo — 0”) po(x) dz, 
for yo = L,;(0) — Lo(0), and for A > 0 
ple, 4) — o(e™, a) = — (4) fe: — 9!” )pi(x; A) dz 


where p;(z; 4) refers to the density for which the ith parameter has slipped an 
amount A and the right hand side is clearly independent of i by symmetry 
considerations. Since we have a sequence {g'”} of procedures, which improves 
on ¢ in terms of risk, at Ap, 4; , --- , An, and y'” is determined by a real num- 
ber cp , Le., 


go (xz) =1 if maxz;— 6<c,p, 
isis 
> ead | 


(24) =1 if maxz; — 6>c,, and z; > 2; forall é # j 


lsisn 





314 SAMUEL KARLIN AND DONALD TRUAX 


we can choose a limiting procedure which is of the same form, and which im- 
proves at all points of the enumerable dense set. Since any A z{A,,} is a point 
of continuity of y(4), it will follow that this limiting procedure has risk no 
larger than the risk of ¢ at all points A. This shows that the class of procedures 
given above is essentially complete. 

To establish minimal completeness we must show that no two procedures in 
this class can dominate the other. Let ¢' and ¢’ be two procedures determined 
by the critical numbers c, and c,. For definiteness, suppose c, > c:. Then 


o(¢',0) — pl¢',0) = —v f (eh — ¢8)pu(2) dz 


= —yP{e. < max (X; — 6) <a} <0, 
lsisn 
and 


p(y’, 4) — p(y’, A) = —y(A) / (gi — ¢i) p(x; A) dx 


y(A)P{X; = max X;, and ce < max (X; — A) 
lsign 1sign 


given that the ith parameter has slipped by A} > 0. 


We state without proof that the corresponding result to Theorems 5.2.1 and 
5.2.2 apply to the symmetric two sided slippage problem. Namely, every sym- 
metric invariant Bayes procedure is characterized as follows: 

go(z) = 1 if max|2z;— 6| <e, 


Isisn 
and 


oz) =1 if max|2z,—- 6|>c 
lsign 
and |z; — 6| >! 2; — 6| for alli # j. 


We now indicate two illustrations of Theorems 5.2.1 and 5.2.2. We discuss 
first an example treated earlier by direct methods. Example 2 below offers a new 
example of our theory. Except for small variations, these are the unique examples 
of the theory since the only distributions which admit a sufficient statistic 
under independent observations and for which the parameter occurs in transla- 
tion form are the distributions of these illustrations. 

EXAMPLE 1: Normal distribution with known variance. 

For both the controlled and uncontrolled problem the maximum likelihood 
estimate is sufficient. Consequently, by virtue of Theorem 5.2.2 a minimal 
complete class of procedures in the uncontrolled case consists of all procedures 
having the form: 


go(z) = 1 if maxz,— z#<e, 
lstsn 


gz) = 1 if maxz; — > candz; > zg; foralli # j. 
lsisn 





SLIPPAGE PROBLEMS 315 


In the controlled case the minimal complete class of procedures were char- 
acterized in terms of the statistic maxi<ic, us — [n/(m + 1))@ where uy = 
x; — y. We observed here as mentioned earlier, that this can be expressed as 
max z; — 6, where 6 = (n# + y)/(n + 1) is the maximum likelihood estimate 
of @ when 6 = --- = 6, = @. 
Examp.e 2: Exponential distribution. 
Let us take p(x — 0) = & *y(x — 6) where 
0 if u<O 
vu) = 
1 if wu20. 
For the non-controlled problem 6 = mini<;<, 2;. The minimal complete class 
of symmetric invariant procedures consists of all procedures having the form: 
g(x) = 1 if max z; — min z; < ¢, 


lsign lsign 
and 


gz) = 1 if max z; — min 2; > c and 2; > 2; for alli # j. 
lsitagn lsgign 
For the controlled case 
y if min 42 y 
isis" 


| min z, if min z S y. 
\lsign isign 


The minimal complete class of symmetric invariant procedures consists of all 
procedures of the form: 


g(x) = 1 if max z; — min (minz;,y) < ¢, 
lsisn lsian 


gz) =1 if max 2; — min (min z,,y) > cand z; > 2; for alli # j. 
lgign lsign 


6. Scale parameter problem. Here we assume that X,, X,,---, X, are 
independently distributed according to the densities (1/6,)p(2/6),---, 
(1/6,)p(2/@,) respectively. p(x) is defined for z 2 O and taken for con- 
venience to be strictly positive. The densities differ only in their scale parameter. 
In addition to the usual assumptions on the loss functions we wil! assume 


L(c6, 9 £22 c6,) = LO y tS. 6,). 


We also hypothesize that where 6, = --- = 6, = 6, the maximum likelihood 
estimate 6 exists and is sufficient for @. Finally, we assume that p(z/6) possesses 
a M.L.R. in the variab'es z and 6. As usual, we restrict attention to symmetric 
procedures invariant under scale transformations where z and @ traverse the 


positive real line. An argument entirely analogous to that given in Section 5 
leads to the following result. 





316 SAMUEL KARLIN AND DONALD TRUAX 


Tueorem 6.1: Under the above assumptions the class of procedures of the form 
g(t) = 1 if max2z,/é <e, 


gz) =1 if max2,/é> cand z; = max z; and z; > 2; forall i ¥ j 
lsisgn 


constitule a minimal essentially complete class of symmetric invariant procedures. 
The I’-family of distributions provide us an example. In the uncontrolled case, 


6 = (1/np) > x, 


t= 


and in the controlled case 


6 = (1/(n+ Ip) x + y). 


Thus, for the uncontrolled problem and the controlled problem the minimal 
complete class of symmetric invariant procedures as characterized in Theorem 
6.1 agree with the class of procedures described in Section 4.3. 


7. Combined translation and scale parameter slippage problem. Let X, , 
X,,°+:, X, be independent, and let X; have density (1/c)p((x — 6;)/c) 
fori = 1, 2,---, . The density is known except for the location parameter 
6; and the scale parameter ¢ > 0. The slippage problem refers to the location 
parameters 6;. Again, we make the usual assumptions about the losses with the 
added restriction that 


Li( (4 + b)/a, > oe (6, + b)/a, a/a) = L(A, a ee 6, , 7) 


for all real numbers b and all positive numbers a. A slight extension of the 
methods of the previous sections enables us to prove 

Tueorem 7.1: Let [[-sp((ai — 0)/0) = h(x)q(¢/o)r((6 — 8)/c) where 
q(@/a) has a monotone likelihood ratio, and r((6 — @)/o) has, for each o, a mono- 
tone likelihood ratio. Then the class of procedures of the form: 


go(t) = 1 if max (2; — 6)/é <c, 
lsisn 


oz) =1 if max (2; — 6)/¢ > cand z; > 2; forall j + i 
Isign 
is a minimal complete class of invariant symmetric procedures. 
Theorem 7.1 may be illustrated by Paulson’s result for Normal variates in- 
volving slippage of the mean parameter with unknown common variance. An- 
other application of Theorem 7.1 is on the density 


(2-6) 


Fe = for z20 
p(x — 6)/r) = 4 (\ > 0 and @ real) 
\0 for x <8 


There are discrete analogues of the results of Sections 5-7 valid for the Pascal 
family of distributions to which our methods apply. 





SLIPPAGE PROBLEMS 317 


8. Uniformly most powerful procedures. In this section we prove, subject to 
slight smoothness restrictions, that each symmetric invariant Bayes procedure 
involving a single critical parameter is uniformly most powerful amongst the 
class of all symmetric invariant procedures, having a prescribed error associated 
with hypothesis H, . The loss functions are assumed to be zero or one, according, 
as a correct or incorrect decision was made. Al) relevant distributions in this 
section are assumed to derive from continuous densities. 

We suppose that the problem has been reduced by invariance so that the 
density p,;(z; 4) under the condition that the jth population has slipped de- 
pends on a single positive parameter 4. Moreover, as usual, the permutation 
x which interchanges j and k and leaves the other indices fixed satisfies 


Prj(te ; 4) dz, = p;(z; A) dz. 


Finally, we suppose, it has been demonstrated that all symmetric invariant 
Bayes procedures are of the form 
g(x) = 1 if max sz; — v(r) <e, 
lsjsn 
(1) 
¢(x) =1 if ;max x; — v(x) > cand 2, > 2; forall j #7 
sign 
t#=1,2,---,n 
where v(x) is a symmetric function of z, and c is a constant. 

Consider a class of a priori symmetric distributions F,,, depending on a 
parameter £ constructed as follows. Let A, be an arbitrary positive but fixed 
real number. We define F4, as a discrete distribution which assigns mass £ 
to po(z) and mass (1 — £)/n to p,;(x; 4o),j = 1, «++ , n. By our previous theory 
we know that the Bayes procedure against F,4,, is of the form (1) with an 
appropriate c depending on &. A simple continuity argument shows that when £ 
varies between 1 and 0, the constant ¢ varies continuously from its largest 
possible value to its smallest possible value. In particular, when — = 1, go(z) = 1, 
and when — = 0, go(z) = 0. For any prescribed c, by continuity, we obtain the 
existence of & such that the given procedure of (1) defined by the constant c 
is Bayes against F',:, . This can be done for any Ay > 9. 

Let 2 denote a decision procedure characterized as in (1) for which the prob- 
ability of accepting hypothesis zero when it is true has fixed size equal to 


| el2)po(2) dz = 1 - a (0 <a <1) 
Consider any other symmetric invariant procedure ¢ having the same prescribed 


error associated with action a). Since ¢ is Bayes against F4,., for a suitable 
£ we have 


[ ws, 6) aF 5,¢,(8) s | oe.) aF 5,¢,(9) 





SAMUEL KARLIN AND DONALD TRUAX 


or 


to | o(2)po(2) dz + y [((1 — &)/n] / (1 — (x) )p;(a, Mo) dx 


s to | ¢o(2)po(2) dz + [(1 — &)/n) a / [1 — 9;(x)]p;(x, Ao) dz. 


(Remember that the loss due to a wrong decision is 1, independent of the nature 
of the error. ) 

Since ¢ and ¢ are symmetric invariant we infer that f (1 — $;(x))p,(z, Ao) dz 
is independent of 7. Since the error in rejecting Hp is fixed, we obtain 


(2) fu — $;(x))p;(2, Mo) dx S fu — ¢j(x))p;(a2, Ao) dz 


and this is true for any do > 0 by choosing in each case £ suitably. 

Thus, we have proved that, amongst all symmetric invariant procedures 
possessing a prescribed probability of rejecting hypothesis Hy when it is true, 
there is a single member (up to equivalence in terms of risk) in the class (1) 
which is uniformly most powerful. 

Another way to express the conclusion of (2) is as follows: Amongst all sym- 
metric procedures with a prescribed probability of accepting hypothesis zero 
when it is true, there is a unique decision procedure of type (1) which maximizes 
the probability of making the correct decision, whatever the true state of nature. 
That there is only one is clear by virtue of the fact that the related distributions 
are all continuous densities. 


9. A multivariate slippage problem. Let S be a p X p Wishart matrix with 
covariance matrix 2, and let X,, X:,--- , X, be independent normal random 
p-vectors with mean vector 6; respectively, and covariance matrix 2. In addition, 
let S be independent of (X,,--- , X,). 

The loss functions satisfy conditions (1) and (2) of Section 2, where the @; 
are now vectors and A is a non-zero vector. Also assumption (B) of 4.1 is assumed 
to hold. The loss functions will also be required to satisfy the invariance property 


L,(CO, + a, +--+ , CO, + a, CEC’) = Lil, --- , O,, 2) 


for all non-singular p X p matrices C and all p component vectors a. We restrict 
attention to symmetric procedures based on X, S which are invariant under the 
non-singular transformations X — CX and S — CSC’, and under translations 
of X; by the same vector. 

It is known that the density of the maximal invariant when the jth mean has 
slipped can be calculated by integrating, with respect to the Haar measure of 
the full linear group, the joint density p(CX, CSC’) transformed by a general 
element of the group, viz; 


f(X,8) [[ exp — } {tr (CSC'S + CXX'C’S* + 50,0 


— 2>°CX®@,,))}(d@dC/| C |") 





SLIPPAGE PROBLEMS 319 


where X = (X,,---, X,) is the matrix with column vectors as indicated and 
OQ.) = (6,---,6,8+ A, 6,---, 6) isa matrix whose jth column is the vector 
6 + A, while the remaining columns are each composed of the vector 6. Here 
q denotes an appropriate real number. We will let © represent the p X n matrix 
every column of which is composed of the vector 6. Now completing the square 
and integrating with respect to @, we have exp [— } tr A’Z"*A] f(X, S8)- 


/ / exp [— } tr CSC'S — 4 tr CXX'C’S + tr 27CX, A’ 
— tr (n/2)="'(00 — 2(CX — (A/n))6’))(do dC/|C |") = g(A, Z)f(X, S) 
-f exp [— 4tr 2°C(S + XX’ — nXX’)C’ + tr 3° C(X, — X)aVdc/\C |"). 


Put W = S + XX’ — nXX’ and the density is 


g(A, 2)f(X, S) [ expt- } tr CSCW +4 tr a’z"C(X, — X)] (aC/|C!"). 


Introducing the new variable D = = C, and defining 7’ = A’=! we have 
g(A, 2)f(X, 8) [ ext- } tr DWD’ + trn/D(X, — X)] (€D/|D|"). 


(It should be understood that after each change of variables, the functions g and 
f may change by a multiplicative factor. However, since the explicit expressions 
of these functions are of no relevance, we will continue to use the same symbol 
and no ambiguities will arise. ) 

Since W is positive definite with probability one, we can reduce the integral 
further by the change of variable DW' = E. The density becomes 


g(A, 2)f(X, S) [ ext- } tr EE’ + tr EW4(X, —X)q'] (dE/|E!"). 
Making use of polar decomposition of matrices we write 
W(X; — X)n = Uln(X; — XY wx; — X)n'} 
where U is orthogonal. Now the change of variable A = EU gives 
g(A, =)f(X, S) [exp {[— 4tr AA’ 
+ tr A[n(X,; — X)’'W"(X,— X)q'}') (dA/\ A |"). 


Let V be orthogonal and such that V[n(X,; — X)/W(X, — X)n''V’ = A 
is diagonal. The resulting matrix is clearly of rank one since 


(n(X, — X)'w'(X, — X)n'} 


is of rank one. A further change of variable of A into V’AV gives 


g(A, 2)f(X, S) [ exp t- } tr AA’ + tr AA] (dA/| A") 





SAMUEL KARLIN AND DONALD TRUAX 


9(A,2)f(X,8) [ exp(— 4 tr AA’ + an(a'n)(X, — XY! 


-W"'(X, — X)}| (dA/|A}*). 


Let Z; = (X; — X)’'W"(X; — X), and denote the integral by p(Z;, 4) 
where 6 = n'n. We now show that p(Z;, 5) has a monotone likelihood ratio, 
and moreover, is a monotone function of Z,; for each 5, where p denotes the in- 
tegral expression (1) excluding the multiplying factors. 

If we expand the integral in an infinite series and integrate term by term we 
obtain p(Z;, 6) = D246 C, 8” ZS". The even coefficients C2, are non- 
negative. We will prove that the odd coefficients are zero. 


ae [ expt- bez aia (dA/|Al*). 


In fact, the change of variable 


leads to 
Cont = [ expt- } =z d;;\(— dit*')(dD ‘| D\*) = — Const ’ 


8O Con4i _ 0. 


The fact that p(Z;, 6) is monotone in Z, is now clear. The monotonicity of 
the likelihood ratio follows from the general result in [4] which states that if 
m(Z, y) has a monotone likelihood ratio and g(y, 6) has a monotone likelihood 
ratio, then p(Z;, 6) = fq(Z:, y) @(y, 6) db(y) has a monotone likelihood 
ratio, where y represents any o-finite measure. In our case 


q(Z; ’ y) = om ’ 


vlogs 


@(y,6) =e 


both of which clearly possess monotone likelihood ratios. 

Finally, careful examination of the derivation of the distribution of the maxi- 
mal invariant will show that the normalizing function g(Z, A) is only a function 
of 6. 

The Bayes procedures are now easily characterized as follows: Let F(6) denote 
an a priori distribution. go( X, S) = 1 if for every 7 





SLIPPAGE PROBLEMS 321 


[ [Eo( Le — L,;)ho( XS) + E(Lo(j, 6) — Lj, 8) A(X, 8%) | adF(S) < 0 


where hy denotes the density of the maximal invariant when there is no slippage, 
and h, is the density when the jth parameter has slipped. This can be written as 


1(X,8) [ 9(8)[Cofo(Lm — L;) + €(Lo(8) — L,(8))p,(Z, ,8)] dF(8) <0. 


Since the integral is monotone in Z;, the inequality is equivalent to Z; < c. 
Moreover, since p;(Z; , 6) is strictly increasing in Z; 


Sc. 9(6) [pi Zy,8) — pl Z,, 8)] dF(8) 2 0 


according as Z; 2 Z,;. Thus, applying Theorem 3.3 we conclude that the form 
of the Bayes solutions is 


e(X,S)=1 if max (X; — X)wW"(X; — &) <ce, 
lsijign 
e(X,S)=1 if max (X,;— ¥)'w'(xX, — X) >, 
isign 
and 
(X, — X)w x, — X) > (xX; — X) w(x; -— X) for all j ¥ i, 
where W = S + XX’ — nXX’. 
If there is a control population each symmetric invariant Bayes solution can 


be determined in an analogous fashion. The explicit solution is: Let Z; = X,; — 
Y, and let S be a Wishart matrix independent of X;, Y. 


g(X,Y,S)=1 if max{Z;— [n/(n+ 1))Z)'W" {Z, — [n/(n + 1)]Z} <e, 


lsign 


where W = {S + ZZ’ — [n’/(n + 1)] 22’ 
¢i(X, sf 8) = 1 if max {Z; — [n/(n + 1))Z}’ w {Z; = In/(n + 1)|Z} >ec 
s3a" 
and 
{Z; — [n/(n + 1)]Z\'W{Z; — [n/(n + 1))2} 
> {Z; — [n/(n + 1)]Z}'W{Z, — [n/(n + 1)]Z} for all j # i. 


In most applications, the matrix S represents the sample covariance matrix 
based on several observations of a normal distribution with covariance matrix =. 

The case where = is known can be handled by similar methods. The solution 
is to look at maxi<i<, (X; — X)/E'(X, — X) in the non-controlled case, 
and maxi<j<n {Z; — [n/(n + 1)|Z}Z™" {Z, — (n/(n + 1)]Z} inthe case where 
there is acontrol and Z; = X; — Y. 


10. Some results for non-parametric problems. Let X,; be independent 
random variables with X,; having a continuous c.df. FP; for i = 1,---, k; 





322 SAMUEL KARLIN AND DONALD TRUAX 


j = 1, 2,---, n. We will define two notions of slippage and, for each, give a 
symmetric invariant procedure which has local optimum properties. These 
notions were discussed by Lehmann {3}, and I. R. Savage [10] and the solutions 
we will give are direct applications of their results. 

In general, we will say that the jth distribution has slipped if F; = F for all 
i # j, and F; = g(F) where g(z) * z is a continuous distribution function on 
[0, 1). The problem is invariant under monotone transformations, and hence, any 
invariant procedure will depend only on the ranks of the observations. First, let 
us take g(F’) = (1 — \)F + AF’. Lehmann has shown that if we let r;; denote the 
rank of X;; in the combined sample, and R the matrix of ranks, then if the jth 
distribution has slipped, the probability of R is 


P})(R) = eu ‘)| Bf I (1 —\) + anu} ' 


where U‘*” denotes the r,,th order statistic in a sample of nk for a distribution 
uniform on [0, 1]. We want to find regions Cy , Ci , --- , C, in the set of possible 
ranks so that P)(C;) = 1 — a, the procedure is symmetric, and P§’(C;) is 
maximized for small \. Since 


(d/dx) PP? (C;) no = { » Beg panes) -*]}/( . ) 


Rec 4; imi nk —k 


it is clear that the region C; is of the form maxi<;<n Dad fy = > > tr > y 
where ¥ is chosen so that 


. 
P| max brys1|=1-4 


isign I=1 


Now, consider slippage as follows. We let g(F) = F'** where \ > 0. We 
order the combined sample and define for each j, Z;” = 0 or 1 according as the 
ith member of the ordered sample is not, or is, from the jth distribution. Let Z 
denote the set of Z{”. The distribution of Z when the jth distribution has slipped 
is, following Savage [10] 


nk i 
P’(Z) = [(nk — k)tki( wi/{ I] (> [7° + a — ZY") + wi). 


We want to choose regions Cy), C,, --- , C, in the set of all Z so that Po(Cy) = 
1 — a, the procedure is symmetric, and P§’ (C;) is maximized for small \. Since, 


nk i 


(d/dd)P}”(C;) |rmo = oe [(nk — k) (mk) { k- 2 a - zi?)/a 
the region C; must be of the form 
nk I nk I 
max > » (Z)"/l) = d 2d (Z)?/l) >e 


t=] t=l 





SLIPPAGE PROBLEMS 


and Cy of the form 
nk tI 
max (s = (z"/)) <e 
8 1 ml 
where c is determined by the condition that Po(Cy) = 1 — a. 


11. Selection of a procedure from a complete class. The complete classes of 
procedures obtained are always of the form: 


o(z) =1 if max Uz) <e, 
13'S" 


g(z) =1 if max U2) > cand Uj(r) > Uz) for alli # j. 
ign . 

The usual way of selecting a procedure from the complete class is to control the 
probability of one of the errors. One may thus choose a number a, 0 < a < 1, 
and ask that E(go| 6, = --- = @,) = 1 — a. Thus, the distribution of max: <;<, 
U;(z) is needed to determine c. In most of the cases discussed, this distribution 
is not tabulated (or even worked out). It has been suggested by Paulson that, 
since the U; are negatively correlated and have the same distribution, a reason- 
able approximation to the solution of 


P{ max U; <c|@ =--- = 6) =1l—-—a 
Isis" 


is provided by the solution of 
P{U\(2) > c] = a/n. 


Another alternative for finding c is to use a multivariate Chebychev inequality 
proposed by Olkin and Pratt [7]. In this way one can put a lower bound on the 
probability of deciding there was no slippage when this indeed is the case. The 
bounds given by Olkin and Pratt can be evaluated explicitly in case the correla- 
tion matrix has equal non-diagonal elements. 

Finally, the constant ¢ can be approximated by direct sampling. 


REFERENCES 

[1] Becunorer, Rospert E., “‘A single sample multiple decision procedure for ranking 
means of normal populations with known variances,’’ Ann. Math. Stat., Vol. 24 
(1953), pp. 16-39. 

[2] Doornsos, R., anp Prins, H. J., “‘Slippage tests for a set of gamma variates,’’ Indaga- 
tiones Mathematicae, Vol. 18 (1956), pp. 329-337. 

(3) Leumann, Erica L., “The power of rank tests,’’ Ann. Math. Stat., Vol. 24 (1953), 
pp. 23-43. 

[4] Karun, 8S. anp Rusin, H., “The theory of decision procedures for distributions with 
monotone likelihood ratio,’’ Ann. Math. Stat., Vol. 27 (1956), pp. 272-300. 

[5} Kupo, A., “On the invariant multiple decision procedures,’’ Bull. Math. Stat., Vol. 6 
(1956), pp. 57-69. 

{6} Mosre.ier, F., “A k-sample slippage test for an extreme population,’’ Ann. Math. 
Stat., Vol. 19 (1948), pp. 58-65. 





324 SAMUEL KARLIN AND DONALD TRUAX 


[7] Ouxin, L., anv Pratt, J. W., “A multivariate Tchebycheff inequality,’’ Ann. Math. 
Stat., Vol. 29 (1958), pp. 226-234. 
[8] Pautson, E., “An optimum solution to the k-sample slippage problem for the normal 
distribution,’’ Ann. Math. Stat., Vol. 23 (1952), pp. 610-616. 
{9} Pratt, J. W., ‘Some results in the decision theory of one parameter multivariate Polya 
type distributions,’ Stanford Technical Report No. 37, Oct. 1955. 
{10} Savage, I. R., ‘‘Contributions to the theory of rank order statistics—the two sample 
case,’’ Ann. Math. Stat., Vol. 27 (1956), pp. 590-615. 
[11] Truax, D., ‘An optimum slippage test for the variances of k-normal distributions,”’ 
Ann. Math. Stat., Vol. 24 (1953), pp. 669-674. 
{12} Lenmann, E., Testing Statistical Hypothesis, John Wiley and Sons, New York, 1959. 





EFFECT ON THE MINIMAL COMPLETE CLASS OF TESTS 
OF CHANGES IN THE TESTING PROBLEM' 


By D. L. BurkHoLpER 
University of Illinois 

1. Summary and introduction. A question of interest in connection with 
many statistical problems is the following: Does a slight change in the problem 
result in a different answer? Here the effect of changes in the testing problem 
on the minimal complete class of tests is investigated. The effects of such changes 
are found to be different for the two families of distributions considered: The 
discrete multivariate exponential family and the continuous multivariate ex- 
ponential family. In Section 2, it is shown that with respect to the discrete 
exponential family, the minimal complete class of tests for a standard testing 
problem is minimal complete for a wide variety of related problems. In Section 
3, an example is given showing that with respect to the continuous exponential 
family, on the other hand, the minimal complete class of tests for a standard 
problem is not necessarily minimal complete for a slight variation of this prob- 
lem. Tests that are admissible for the standard problem are not necessarily 
admissible for the variation. 

Partly in a general decision theoretic framework and partly with respect to 
specific examples, Hoeffding [2] has discussed the effect of changes in the family 
of probability distributions on the minimax solution and other optimal solu- 
tions. He has also given key references to the extensive literature on the per- 
formance of standard procedures for families of probability distributions not 
satisfying all the assumptions under which the standard procedures were de- 
rived. Workers in this area have primarily concentrated on the effect of changes 
in the probability model on a single solution rather than on a class of solutions, 
for example, the class of admissible sclutions, as we do here. 

We recall some basic ideas. Consider the probability structure (x, @, P, 
2) where X and © are sets, @ is a o-field of subsets of X, and for each @ in Q, 
P, is a probability measure on @. Relative to the above structure, a testing prob- 
lem is an ordered pair (wo , w;) of disjoint subsets of 2. A test ¢ is a function from 
% into [0, 1] measurable with respect to @. The test ¢ is used in the following 
way: A random element X with values in & having P, as its probability dis- 
tribution is observed. If z is the outcome then the hypothesis H: @ ¢ wy is re- 
jected with probability ¢(z) in favor of the alternative A: 6 ¢ w, . If the test ¢ 
is used and @ is the parameter, then the probability that H is rejected is Ew = 
Sx e(x) dP,(z). If ¢ and ¢* are tests, then ¢ is at least as good as ¢* if 


Ew — Ex* = 0 


Received September 17, 1959. 
' This work was supported in part by the National Science Foundation. 
325 





326 D. L. BURKHOLDER 


for 6 in w, 2 O for 6 in w. The test ¢ is better than ¢* if ¢ is at least as good 
as ¢* but ¢* is not at least as good as ¢. A test ¢ is admissible if there is no test 
better than ¢. A class of tests is complete if to each test not in the class there 
corresponds a test in the class which is better. A class is minimal complete if it 
is complete and no proper subclass is complete. The notions, “essentially com- 
plete” and ‘minimal essentially complete,” are defined similarly with ‘at 
least as good”’ substituted for “‘better.”’ 


2. The discrete exponential case. Let X be a countable set, h a function from 
%X into the positive numbers, and s and ¢ functions from &X into the real numbers. 
Let © be the interior of the set of points 6 = (6,, 6) where 6, and 6 are real 
and satisfy 


(1) > h(x)e* 2) < ow, 
zeX 


For 6 ¢ Q, let k(@) be the reciprocal of the left hand side of (1), 
p(x) its k(O)h( xe F4 


zreX, and P(A) = Dee po(x) where A ¢ @, the collection of all subsets of 2. 
For the testing problems considered here, the parameter 6, is the more important 
one, the parameter @, usually being a nuisance parameter. To keep the discus- 
sion more compact we limit our attention here to the case of one nuisance param- 
eter although results similar to those that follow are obtainable in much the 
same way for the case of more than one nuisance parameter. 

Assume throughout this section that any nonempty subset of 


VY = {i(r) | ze X} 


contains a least element. For ye Y let A, = {z|t(z) = y}. 

Let C be the class of tests such that ¢ is in C if and only if there is a function 
c such that g(z) = 1 if s(x) > e(t(z)), = 0 if s(x) < c(t(z)), rex. Let D 
be the class of tests such that ¢ is in D if and only if there isa functione and a 
function d such that g(x) = 1 if s(x) < e(t(x)), = O if e(t(z)) < a(x) < 
d(t(z)), = 1 if s(x) > d(t(z)), re &. 

THEOREM 1. Let (wo, w:) be a testing problem relative to the above described 
probability structure (%, @, P,Q) such that for i = 0, 1, the set {(e", e’*) | 0 wi 
has a limit point (M;, 0) where 


inf {0,| 6&2} < log My < log M,; < sup {4 | 6 Q}. 
Then no test in C is better than some other test in C. If, in addition, 


sup {6, | 0 € wo} S inf {6 | 6 & a}, 


then C is minimal complete. 

Proor. Let Co be the class of all tests in C that are functions of z only through 
s and t. Thus, if ¢ is in Cy then ¢ has the form ¢(z) = 1 if s(x) > e(t(z)), 
= a(t(x)) if s(x) = c(t(z)), = 0 if s(x) < c(t(z)). If ¢ is in C then by the 





EFFECT ON TESTS OF CHANGES IN PROBLEM 327 


theory of sufficient statistics or by an easy calculation there is a g in Cy such 
that Exe = Eno, 6 €Q. 

Suppose that ¢ and ¢* are tests in Cy and that ¢ is at least as good as ¢”. 
If we can show that ¢ = ¢*, implying that ¢ is not better than ¢*, then the first 
assertion of the theorem will follow. 

Let ¥(xr) = [¢(x) — ¢*(z) h(x). Suppose that it is not true that ¥(z) = 0, 
zeX. Then let y be the least element z of Y not satisfying ¥(z) = 0, re A,. 
Then since ¢ is at least as good as ¢*, 


a Ja) gt tno < 0, 6 £ wo, 
zeX 


= 0,0eum, 
so that taking limits of both sides as (e"', e*) + (M,, 0) with 6 ¢ w, gives 


(2) DL v(x) Me? <0 5 2) v(2)Mi®, 

ztAy atAy 
provided it is permissible to interchange this limiting operation with the sum- 
mation here. That this is permissible follows from the dominated convergence 
theorem, since there are points 6° = (6, 02), i = 0, 1, in Q satisfying 0m < 
log M, < @,, i = 0, 1, and G = Do; he*"*"*"™ satisfies } eae G(x) < @ 
and | ye"'***""™ | < G for 6 satisfying 


64 < A, < 61; 9 6. < min 6.9 e 


Since ¢ and ¢* are in Cy either ¥(z) 2 0, ze A,, or ¥(z) S 0, re A,. 
Thus, by (2) we have that ¥(z) = 0 for z ¢ A, . But this contradicts the defi- 
nition of y. Hence ¥(z) = 0, x e X, implying ¢ = ¢*, and the first assertion is 
proved. 

Now suppose, in addition, that sup {64 | 6 ¢ wo} S inf {@,| @em}. Let r = 
sup {6 | @€ wo}. A general result obtained by Truax [7] implies here that C 
is essentially complete for the testing problem (wo(r), w:(r)) where 


w(r) = [06/6e2,6 Sr} 


and 
wm(r) = {0\|6e2,6, >r}. 


This could also be shown by slightly modifying arguments contained in a paper 
by Lehmann and Scheffé [4]. This readily implies that C is essentially complete 
for (we , w) as follows. For any test ¢, Eg is continuous in 6 for @ in 2. Suppose 
¢* is not in C. Then there is a test ¢ in C such that Ew — Eg* S Oif 6 & wol(r), 
= 0 if 6 ¢ w(r) and hence by continuity if 6 ¢ Q, 6, 2 r. Thus, for the problem 
(wo, w), ¢ 18 at least as good as ¢* and it follows that C is essentially complete. 

We now show that C is complete. Since C is essentially complete, this will 
follow from showing that, if ¢ is in C and ¢* is any test satisfying 


Ew = Ex¢"*, 6 & we Uw, 





«€ 


328 D. L. BURKHOLDER 


then ¢* is in C. Suppose ¢ and ¢* are such tests where g(x) = Lif s(x) > e(t(z)), 
= 0 if s(x) < c(t(x)), re X. Then ¢*(x) = 1 if s(x) > c(t(z)), = Oif 


s(x) < e(t(z)), ze &. 


For suppose this is not true. Let y be the least element z of Y not satisfying 


¢g*(z) = 1 if s(x) > e(z), 


0 if s(x) < e(z), 
7 le(z) — o*(x) h(x) = 0. s(x) = c(z) 
Then, by a limiting process similar to one used above, 


D le(z) — o*(x)Ja(z) Mi” = 0, i= 0,1. 
zeAy 
But this, with the assumption that M, < M,, implies by a standard Neyman- 
Pearson type calculation that (3) must be true for z = y, a contradiction. 
Thus ¢* has the desired form and is therefore in C. 

The minimal completeness of C now follows from the completeness of C 
and the first assertion of the theorem. 

REMARKS. 

(i) Theorem 1 indicates that C, minimal complete for the standard problem 
of testing @, S r against # > r, remains minimal complete if the testing prob- 
lem is changed provided that certain conditions are satisfied. A simple kind of 
permissible change is the introduction of an indifference zone. For example, 
C is minimal complete for testing 6, S r; against 6; 2 r. where r; < rr. 

(ii) It is clear from the theorem and proof that Cy is minimal essentially 
complete. 

(iii) The inequality My) < M, was not needed in the proof of the first asser- 
tion. 

(iv) A result obtained by Lehmann [3] is related to this theorem. Lehmann 
considers only the testing problem 6, S r against 6, > r, but in the setting of 
the general exponential family. He shows that (> is minimal essentially com- 
plete for testing 6, S r against 6 > r. 

THEOREM 2. Let (wo, «) be a testing problem relative to the above described 
probability structure (%, @, P,Q) cuch that w, = w. U w; and for i = 0, 2, 3, the 
set {(e", e) | @€ w,} has a limit point (M;, 0), where 

inf {@,| @eQ} < log Mz < log Mo < log M; < sup{@ | @e Q}. 
Then no test in D is better than some other test in D. If, in addition, 
sup {6, | @€ we} S inf {A | 6 € wo} 


and sup {6,| 6 ¢€ wo} S inf {@, | @ € ws} then D is minimal complete. 
The proof exactly parallels the proof of Theorem 1 and is therefore omitted. 
Theorem 2 implies that D is minimal complete for the standard problem of 





EFFECT ON TESTS OF CHANGES IN PROBLEM 329 


testing 6, = r against @, = r and is also minimal complete for such problems 
as testing re S 4 S rs against 6 Sr; or 2 rm where 


"1 <T STs < 1%. 


ExampLes. Theorems | and 2 have straightforward applications to testing 
problems with respect to certain multinomial distributions, distributions aris- 
ing in contingency table analysis, and also to two-sample binomial, Poisson, 
negative binomial distributions, and so forth. We examine a little more closely a 
typical example, the two-sample binomial case. Let X = (X,, Xe) where 
X, and X, are independent random variables and X,; is binomial (n;, p,), 
i = 1, 2. Here we may let 


s(x) = N, t(x) = 14+ 2, x= (1%, 22), 


ois tae ( Pi -P). 
—pl l—-wp 


Theorem | implies that C is minimal complete for any of the following testing 
problems: p;/(1 — pr) S ripe/(1 — pe) against p,/(1 — pr) S repe/(1 — pre); 
Pi S repr against p, = rape; Pi = Topo against p, = rep, ; and so forth. Here 
ry < fe, Ts <M, 12 S1 Sm. It is not hard to see that C is not necessarily 
minimal complete for such problems as testing p: — pe S 7; against p) — p2 2 
re where r, < 0 < r.. Here the indifference zone is too large in the sense that 
the first assumption of Theorem | is not satisfied. For this problem C contains 
too many tests, some tests in C being inadmissible. Theorem 2 implies that D 
is minimal complete for a variety of two-sided testing problems involving the 
ratio (pi/(1 — pi))/(p2/(1 — pe)) or the ratio p;/pe. 


Thus, Theorems 1 and 2 imply that the tests described and investigated by 
Fisher [1], Tocher [6], Sverdrup [5], and Lehmann [3], among others, in connec- 
tion with the two sample binomial and related problems are admissible (since 
they are in C or in D) not only for standard testing problems but also for useful 
modifications of these problems. 


3. Counterexample for the continuous exponential case. In the previous 
section it was seen that, with respect to the discrete exponential family of dis- 
tributions, the minimal complete class of tests for a standard problem often 
proved to be minimal complete for a wide variety of related problems. We now 
show by an example that the situation is much different for the continuous ex- 
ponential family of distributions. This example, though dealing only with a 
special subfamily of the continuous exponential family, does not seem to be 
atypical and does seem to reveal the essential features of the general case. 

Let X = (X,, Xe) where X, and X;, are independent random variables and 
X, has the waiting time density with parameter A; , 7 = 1, 2. That is, for z = 
(2, , 22) in the first quadrant, the value of the joint density is 


Orang) #21 +22) (Aa) 
Arve” aA) +121 +22 2. 





330 D. L. BURKHOLDER 


Thus, for the standard problem of testing \2 — 4; S r against A.» — A; > r, 
the minimal complete class of tests is C where ¢* is in C if and only if there is a 
test ¢ and a function c such that ¢(z) = 1 if m > e(m +m), = Oif a < 
c(a%, + 2), and g(x) = ¢*(z) a.e. (Lebesgue) for z in the first quadrant. 
(See Lehmann [3].) 

The class C is too large to be minimal complete for the related problem of 
testing \» — \; = —1 against 4. — >; = 1 as we now show. Let ¢* be any test 
such that ¢*(z) = 1 if x, > c*(a, + m), = O if 2 < c*(xz, + 2), where for 
each nonnegative integer n, 


c*(y) = 0 if 1/2" <y—2 < 1/2", 
=log4 if 1/2" <y—2<1/2"". 


Then ¢* is in C and ¢* is inadmissible. For let ¢ be the test satisfying ¢(z) = 1 
if z, > c(z, + 2%), = O if a, < c(x, + 22), where c(y) = log 2 if 2 < y < 3, 
= ¢*(y) otherwise. Then ¢ is better than ¢* as can be seen as follows. Straight- 
forward calculation gives 


3 
Exe — Exe* = (r2)/(2 — s) [ on mH ewe le 


where A = (A, Az). Thus, if for 7 > Oandr = —1, 1, 


3 
(4) [ e™e™ — | dy > 0, 
2 


then Eyw — Exye* < 0 if Xs — \y = —1, >O if A — A, = 1, implying that ¢ 
is better than ¢*. Consider the inequality (4) for r = 1, the other case being 
proved similarly. For » > 0, 


3 
l ef ca &™| dy 


x 2-20 errr 
oe"> [f e™(1 — 2) dy+ e™(4 — 2) ay | > 0, 
2 


n=—0 —(2n+1) 2-(2n+2) 


since the nth term in the series is greater than 


— gent cy 


exp (—1/2 
— exp (—»/2""*") ins 0. 


REFERENCES 
[1] R. A. Fisuer, ‘‘The logic of inductive inference,’ J. Roy. Stat. Soc., Vol. 98 (1935), pp. 
39-54. 
[2] Wassity Hoerrpina, ‘‘The role of assumptions in statistical decisions,’’ Proceedings 
of the Third Berkeley Symposium on Mathematical Statistics and Probability, 
Vol. I, University of California Press, 1956, pp. 105-114. 


[3] E. L. Leumann, “Significance level and power,’’ Ann. Math. Stat., Vol. 29 (1958), pp. 
1167-1176. 





EFFECT ON TESTS OF CHANGES IN PROBLEM 331 


[4] E. L. Leamann anp Henry Scurrré, “Completeness, similar regions, and unbiased 
estimation-Part II,’’ Sankhyd, Vol. 15 (1955), pp. 219-236. 

[5] Ertinc Sverprvup, “Similarity, unbiasedness, minimaxibility and admissibility of 
statistical test procedures,’’ Skand. Aktuarietids., Vol. 36 (1953), pp. 64-86. 

(6] K. D. Tocuer, “Extension of the Neyman-Pearson theory of tests to discontinuous 
variates,’’ Biometrika, Vol. 37 (1950), pp. 130-144. 

[7] Donatp R. Trvax, ‘““Multi-decision problems for the multivariate exponential family,” 
Stanford Technical Report No. 32 (1955). 





SOME PROPERTIES OF A CLASS OF BAYES TWO-STAGE TESTS 


By Morris SKIBINSKY 


Purdue University' 


Summary. Let X be a normal random variable with variance one and mean 
either +a/2, where a is a given positive constant. Let C,,(m = 0) denote the 
class of all two-stage rules with first sample size of m for deciding, after two 
successive samples of independent observations on X, which of the two mean 
values is correct. This paper investigates the class of Bayes rules in C,,, 
parametrized by a priori probabilities on the hypotheses, and simple wrong 
decision losses. The cost per observation is taken throughout to be unity. 

Section 1 gives some general properties of Bayes rules in C,, for decisions be- 
tween any two continuous densities for X; Sections 2, 3, and 4 concern the 
densities specified above. Section 2 consists of a detailed development of Bayes 
second sample size properties in terms of the Bayes parameters and first sample 
outcomes. For example, Theorem 2.1 gives non-trivial lower and upper bounds 
for positive values of the Bayes second sample size corresponding to any fixed 
value for the minimum wrong decision loss. In Section 3, sufficient conditions 
are given under which the losses may be chosen so as to obtain Bayes rules with 
preassigned invariant error probabilities. (Invariance is taken with respect to 
changes in the prior probabilities. ) It is shown how this result leads to rules which 
minimize the maximum expected sample size among rules in C,, with error 


probabilities less than or equal to specified values. An illustrative example is 
considered for the case when these specified bounds are equal. The selection of an 
optimum first sample size for this example is treated in Section 4. The resulting 
rule has the above described good property among all two-stage rules (of any 
first sample size) subject to these bounds. Tables are included giving optimum 
first and second sample sizes and the values of auxiliary functions when this 
common bound on the error probabilities is .05 and .01. 


1. Introduction. Let fo and f,; be two given probability densities, and suppose 
that X is a random variable which has a probability density known, a priori, to 
be either fo or f; . By the decision i we shall mean the decision to accept f; as the 
true density of X. Unless otherwise explicitly noted, the index i will always take 
on the values 0, 1. Our problem is parametrized by four positive numbers, g; , W; , 
with 
(1.1) ft+n = 1. 


It will be helpful to regard g; as the “a priori probability” of f;, and W;, as 
the loss incurred when f; is the density of X, and the decision 1 — i is made, 
although this interpretation is not essential. 


Received, October 10, 1958; revised August 3, 1959. 
1 Work on Section 4 was done at the Brookhaven National Laboratory. 


332 





PROPERTIES OF BAYES TWO-STAGE TESTS 333 


The a priori probabilities, in view of (1.1), are uniquely determined by their 
ratio, g = go/g: . The losses are uniquely determined by their ratio and minimum, 
W = W./W:, M = min (We, W:). These three numbers in turn uniquely 
determine the a priori probabilities and losses. i.e. 


g=9/(l+g9),9—=1/(1+g), and W, = My(W), Wi = My(1/W), 


where 
» dl fsl 
peace) wee 


We shall refer as convenience requires, sometimes to one, (go, g:, Wo, W:), 
sometimes to the other, (g, W, M), of these two equivalent sets of parameters. 

Let X, , X2, --- denote a sequence of independent random variables identically 
distributed with X. We shall refer to the members of this sequence as observa- 
tions. Denote by X,; the vector of the first 7 observations. Let E;- denote the 
expectation operator under f;; F,(-| X,), the conditional expectation, under 
f:, given the first k observations; and E(-), the operator, goMo(-) + gf:(-). 

We shall consider the class, C,, , of two-stage rules S for deciding between fy 
and f, which depend on a non-negative first sample size m; a non-negative second 
sample size, »»(X.), dependent on the outcome of the first sample; and a 
randomized terminal decision probability, D,(X,),n = m + va(X,.), dependent 
on the outcome of both samples, where we must accept f, with this probability 
or fo with one minus this probability. We suppose, for m > 0, that the second 
sample size and terminal decision functions are always measurable with respect 
to m and n dimensional Borel sets, respectively, and that the expectations, de- 
fined below, exist. It can be shown that no loss in generality derives, in the 
present case, from failure to consider procedures which randomize first and 
second sample sizes. 

The expected overall sample size, under f; , associated with a rule S in C,,, is 
&(S) = m + Ew,(X,.). The probability, under f; , that it will lead to decision 
1 — i, may be written 


QS) = EV im( Xm ’ Vm( Xm)), 
where 
Vim( Xa, v) =~ t+ (1 — 2)E{Dnse(Xnsr)| Xa} 
is the conditional probability, under f; , given the first sample observations (and 


second sample size equal to v), that the rule will lead to decision, 1 — 7. 
The average risk associated with a rule, S, in C,, is defined to be 


(1.2) R(g, W, M| 8) = 2, 96s S) + WQS)}. 


We define a Bayes two-stage rule (with respect to g, W, M), for deciding between 
fo and f; to be a rule S in C,, for which »,, and D, minimize the average risk. 
It is easy to show that a terminal decision function which minimizes the average 





334 MORRIS SKIBINSKY 

risk with respect to any given triplet of parameters, g, W, M, for any overall 
sample size n and any overall sample X, is given by 

(1.3) Dz(Xe,gW) = (1, 4, 0) 


according as | [J-1f,(X;)/fo(X;)(>, =, <)gW, where the choice of } in the 
case of equality has been made simply for convenience in later definitions, and 
the product is defined to be one, when n = 0. 


We may express the average risk (1.2) in the form 
R(g, W,M|S) = m+ ELn(Xm, ¥m(Xm), 9, W, M), 


where 


(1.4) Ln(Xu,v,9,W,M) =v + 2 W gim( Xm) V im( Xm 2) 


is the conditional average expected loss associated with S, given the first sample 
observations, X,, , and second sample size equal to », and 


gin(Kn) = 9 THX) / So Tacx) 


is the “a posteriori probability,” given the first sample observations, that f; is 
the density of X. 

Since the values of L,, are always bounded below by », there must exist a 
second sample size which minimizes the average risk with respect to any given 
triplet of parameters, g, W, M, and first sample outcome X,, . When the terminal 
decision function is given by (1.3), we shall denote the values of such a function 
by vn(X», 9, W, M). The form of the terminal decision function (1.3) then 
implies that 0 < va(X,, 9, W, M) S min [Wogom(Xm), Wigim(Xn)] S M. We 
shall refer to », as a Bayes second sample size function. In an analogous way it 
can be shown that a Bayes first sample size exists and must always lie in the 
interval [0, 2M]. 

Let S*(g, W, M) denote the rule in C,, with second sample size function, 
vs. , and terminal decision function (1.3). Clearly, S*(g, W, M) is a Bayes rule. 
We state the following immediate consequence of this fact for later reference. 

Lemma 1.1. If S is any rule in C,, such that Q:(S) s Q:(S*(g, W, M)), then 
> ino 9:&:( S*(g, Ww, M)) Ss Dino g&(S8). 


2. The Bayes second sample size function. In that which follows, we shall be 
concerned with the class of two-stage rules S*(g, W, M), defined above, for an 
arbitrary value of g, as W and M are allowed to vary. The problem will be to 
distinguish specifically between the two densities 
(2.1) f(z) = (2n)~ exp {—4{z + (4 — t)aly 
(where a is some given positive constant). We shall throughout use the notation 


600) = (Ve) [exp [42 ae 





PROPERTIES OF BAYES TWO-STAGE TESTS 335 


The present section contains an outline of basic functional properties associ- 
ated with the Bayes second sample size and related functions, arranged as a 
sequence of lemmas and theorems in the order of their dependence. Proofs are 
mostly omitted due to space requirements. Some of greater interest or importance 
are sketched. 

In view of the specified densities f; , we may write our terminal decision func- 
tion (1.3) as D3(X,, gW) = (1, 4, 0) according as anX,(>, =, <) IngW, 
where X, = > es X;/n,n > 0, Xo = 0. 

The Bayes second sample size va(X., g, W, M) associated with the rule 
S*(g, W, M) is, for fixed values of its arguments, a value of »y 2 0, with respect 
to which (1.4) is minimum. For convenience in treatment, we may (again in 
view of the specified densities) express the function (1.4) in the following form. 


La(Xm,¥,9,W,M) = a’ 2(a’v, amX,, — IngW, W,a’M), 


where 


(2.2) L(y, t,t, un) = y + wW(t)laly, t) + e'aly, —t))/(1 + fe’), 
forOsS y,f,~< ~,—-2 <t< o, and 


(2.3) a(y, t) = o.5V7y — t/Vy), y > 0, 
a(0,t) = lim a(y, t) = (0,4, 1) ast (<, =, >)0. 
y>+0 


Observe that 
(2.4) L(y, —t, 1/5, w) = L(y, tf, w). 


LemMa 2.1. For fixed, positive {, u, the equation d£/dy = 0, 
(a) has 2 positive roots in y whenever —1(1/f, w) < t < H(t, uw), t ¥ 0, 


(b) 1 t= 0, 
(ec) 1 t= —&(1/t,u) ort = if, w), 
(d) 0 otherwise, 
where i(¢, u) is the unique positive root in t of the equation 
(0L/dy)ymowm = 0, Git) = AVF +1 —-1). 


With respect to its argument y the function £ has, in case (a), a unique rela- 
tive maximum at the smaller, and a unique relative minimum at the larger of 
the two roots; in case (b), it has a unique absolute minimum at the single root; 
in case (c), £ is strictly increasing in y, except at the single root; in case (d), it 
is strictly increasing in y. 

We note that the function G(t), which is defined in the above lemma, is for 
all ¢, « > O, and all ¢, the unique root in y of L£/dy/ = 0. 

Lemma 2.2. f(t, uw) is 

(1) for fixed » > 0, a positive, bounded function of ¢ > 0, strictly monotonic to 
either side of a minimum at = 1. 





336 MORRIS SKIBINSKY 


(2) for fixed § > 0, a strictly increasing function of u which tends to 0 as u — 0, 
and to ~, as y— ~” (uniformly for all § > 0). 

(3) a continuous function of non-negative £ and positive pu. 

Lemma 2.3. 

(1) For any fixed positive ¢, bounded away from zero, lim,+« (f(¢, #)/in uw) = 1. 

(2) limy so [limes &(¢, w)/u*] = 1/169. 


Let 
{(t,f,u): —-~ <t< ~,f,u > O}, 
= {(t,f,u):t =0,¢,4 > Of, 
= {(t,f,u): —W(1/t, uw) <t < (fu), £, 6 > Of, 
{(t,f,u):t = —&&(1/f, w) ort = f(g, w),$,u > O}, 
A, = AUA. 


For all (t, ¢, w) ¢ A, , we define g(t, £, w) to be either the larger or the unique 
root in y of the equation: d£/dy = 0. By Lemma 2.1, £ has a unique relative 
minimum with respect to y at y = @, for all (t, ¢, #) ¢ A. On the other hand, £ 
is absolutely minimum at y = 0, for all (t, ¢, ») e A — A. 

Now for all (t, ¢, #) ¢ A;, let 


(2.5) U(t,g,u) = £(9(t, f, uw), 5,4) — £(0,¢, F, w). 


and define 


(g(t, fw), (t, ¢, uw) eA, and U(t, , wu) < 0, 


2.6 Hs =<" : 
(2.6) ee \0, elsewhere in A. 


Observe that by (2.4), 


(2.7) i(t, $,u) = 9(-t, 1/f, w), 


and that this symmetry holds also for U and y*. 
LEMMA 2.4. 


(1) & has a unique absolute minimum with respect to y at y = y*, for all 
(t, ¢, w) € A with the exception of those points for which U(t, ¢, u) = 0. When 
U(t, ¢, w) = 0, & is absolutely minimum with respect to y at both y = y* = 0 
andy = G > 0. 

(2) y*(t, £, ) is bounded above by £(0, t, £,u) = wh( ger" /(1 + ge’) <u. 

(3) U(t, £, w) S O implies that (t, f, w) e A. 

(4) U(O,¢, 4) < 0. 


It will be convenient for our purpose from this point on to regard any second 
sample size vy as ranging over the non-negative real numbers rather than re- 





PROPERTIES OF BAYES TWO-STAGE TESTS 337 


stricting it to 0 and the positive integers. As a consequence of this relaxation, it 
follows from (2.1) and part (1) of the above lemma, that except when 


(2.8) U(amX,, — IngW, W, a’M) = 0, 
the Bayes second sample size function is unique and indeed 
(2.9) va(Xn, 9, W, M) = a*y*(amX,, — IngW, W, aM). 


Thus, in effect, the Bayes second sample size is reduced to dependence on only 
three arguments. In the exceptional case, which will be shown always to have 
probability zero, we may, by part (1) of the above lemma, choose v2, to be either 
0 or a “9. 

Let v(t, ¢) = 2e2(1 — te’) — 1). By straightforward calculation, we find 
that for ¢ positive but not equal to one, and every positive y, u, 2/dy is mono- 
tonic in ¢ to either side of a minimum at the unique value of t for which v(t, £) = y. 
When ¢ = 1, and y and yx are positive, 9£/dy is monotonic in ¢ to either side of a 
minimum at? = 0. 

We are now in a position to describe the behavior of the value of ¢ for which 
9 is maximum. Thus, we introduce, below, the function T. (See statement one 
of Lemma 2.6). 

Lemma 2.5. For allt, u > 0, ¢ # 1, the equation, 


dL/dy lymece.t) - 0, 
has a unique root in t. This root,t = T(t, uw), say, has the following properties. 


(1) (a) TS, u) = -TCSS, w). i 
(b) 0 < Tig, u) < min [— Inf, Uf, w)), whenever 0 < ¢ < 1. 
(ce) limy.o (fg, w) = w/16m. 
(d) lim,.o T(¢, w) = 0, lim,.. T(t, w) = —Inf. 

(2) Define Til, w) = 0, then T(f, w) is 
(a) for fixed w > O, a strictly decreasing function of § > 0. 
(b) for fixed ¢,0 < ¢ < 1, a strictly increasing function of » > 0. 
(c) continuous inf, up > 0. 


Lemma 2.6. On the set, A, , over which it is defined, @(t, £, w) is positive and 


(1) for fixed §, w, strictly monotonic in t to either side of a maximum at 
t = T(f, p). 

(2) for fixed t, u, bounded in £ and strictly decreasing for ¢ < 1. 

(3) for fixed t, §, unbounded and strictly increasing in y. 

(4) continuous in t, f, pu. 


By part 4 of the above lemma, U(t, ¢, «), defined in (2.5), is continuous in 
its arguments over its domain of definition, A, . Differentiating U with respect 
to t, we find that 


(2.10) aU/at 20, t20. 





338 MORRIS SKIBINSKY 


By part 3 of Lemma 2.4, U(—(1/f, »), ¢, w) > 0, U((E, uw), ¢, w) > 0. Hence, 
by part 4 of that lemma and (2.10), the equation, U(t, ¢, 4) = 0, has, for each 
positive ¢ and yw, a unique negative and a unique positive root in ¢. If we denote 
the positive root by t = ¢*(f¢, wu), then by the symmetry (2.4), (2.7), the cor- 
responding negative root must be t = —?*(1/f, uw). Hence, by (2.6), for each 
positive ¢ and yw, we may write ‘ 


{4 —_ f* * 
(2.11) y*(t, ¢,u) = a fH), PMS. w) <t< (5, 4u), 
It is apparent that 


(2.12) 0 < t*(g,u) < Uf, 4), tf, > 0. 
Also, we may now write the event (2.8) in the form 


amX,, = {[IngW — t*(1/W,a@M)} or _—— [IngW + ¢*(W, @’M)}, 


for which it is clear that the probability of this event is always zero. We recall 
that the Bayes second sample size, (2.9), is unique with the exception of the 
above event. Hence, the Bayes rules, S*(g, W, M) in C,, are unique up to sets of 
probability zero. 

Lemma 2.7. G(é(1, w)) S g(t, £, w) < u’/8x, for all (t, ¢, ») e A,. (G is de- 
fined in Lemma 2.1.) 

Proor. The lower bound follows from the first parts of Lemmas 2.6 and 2.2, 
the fact that G is symmetric about t = 0 and strictly increasing in | ¢ |, and the 
fact that by the definition of f in Lemma 2.1, g(t, ¢, uw) = G(t) for all 
(t, ¢, #) e Ay. The upper bound follows from Lemma 2.5 and part one of Lemma 
2.6, together with the fact that 9(7(1/f, »), 1/f, w) = 9(T CE, w), £, w), is for 
fixed 4 > 0, strictly monotonic to either side of a minimum at ¢ = 1, and 


limyo 9(T (k, uw), £, w) = limpeo 0(T(f, w), £) = w'/8e 


We note that for any » > 0, the lower bound of the above Lemma is attained 
when ¢ = l andt = +4(1, wu), the upper bound, for ¢ = 7T(f, u), in the limit 
as{—Oor . 

Combining the above lemma with part 2 of Lemma 2.4, and (2.9), we have, 
immediately the following. 

TueoreM 2.1. a “G(i(1, a’M)) < vA(Xn,g, W, M) < min [M, a°M*/8x}, 
whenever 


[In gW — t*(i/W, a’M)] < amX,, < [IngW + t*(W, a'M)), 
i.e. whenever the Bayes second sample size is positive. 
LEMMA 2.8. 
(1) For all finite t and for all § > 0, lim, .. [9(t, £, w)/In wp] = 8. 


(2) Let & be an arbitrary, fixed, positive number less than 1, and define 
ts(u) = (1 — 8) Inu. Then for all positive ¢ bounded away from zero, 


limy cw [9(ts(u), ¢, #)/ In pw] = 2(1 + V8)’. 





PROPERTIES OF BAYES TWO-STAGE TESTS 


Proor. The identity, 
8£/0Y \m usm = 90, (t, £m) € Ay, 
may be written in the form 
g(t, fw) — a(t, f,u)) — Sing = 0, 
where 
a(t, f,u) = Af In g(t, f, w) + 2 ln [2V2e(1 + fe) /H(F)] 
—t+ €/H(t, , w/t fw). 


By part 3 of Lemma 2.6, for all bounded ¢ and all ¢ > 0, p(t, ¢, w) tends to zero 
as 4 — *, which proves part (1). When ¢ = &(), the identity may be put in 
the form 


(1 + Apu(t, w)Iri(f, w) — 4(1 + 8)(1 — poslt, w)Ir(¥, w) + 4(1 — 8)? = 0, 
where 

ro(f, ue) = G(te(u), f, )/ Ing, 

pul(f, wu) = Ing(t(u), f, w)/G(b(m), f, #), 

pu(t, mw) = Zin (2V/2e(t + uw") /W(E)I/(1 + 8) Ing. 


Now let ¢ be positive and bounded away from zero. By Lemma 2.3, when yg is 


sufficiently large, ts(u) < @(¢, w). Hence (ts(u), ¢, «) ¢ A; for u sufficiently large. 
We may now use the lower bound of Lemma 2.7 (which tends to © with yw) 
to show that py(f{, uw) tends to zero as u — ©. pw({) obviously tends to zero as 
u — «. Thus, for » sufficiently large, the quadratic equation in r, 


[1 + Api(t, uw)” — 4(1 + 8)[1 — p(k, wir + 4(1 — 8)* = 0, 


has, two real roots in r, one of which, by the identity, must be equal to r;({, uw). 
Hence, in the limit, as » — ©, rs({, #) must be equal to one or the other of 
2(1 + v/5)?. By Lemmas 2.7 and 2.3, 


lim, ri(f, ) = lim, 2 G(i(l, p))/ In = 2. 


Hence, the conclusion of part 2 follows. 
Lema 2.9. t*(f, pw) is 


(1) for fixed p > 0, a positive, bounded function of ¢ > 0, strictly monotone to 
either side of a minimum at {t = 1. 

(2) for fixed ¢ > 0, a strictly increasing function of » which tends to 0 as p — 0 
and to ~ asy—> © (uniformly, for all f > 0). 

(3) a continuous function of positive { and p. 


Proor. That ¢* is, for fixed » > 0, a positive, bounded function of ¢ > 0, and 
tends to zero with yu, uniformly for all ¢ > 0, follows from (2.12) and Lemma.2.2. 





340 MORRIS SKIBINSKY 


By (2.5) and (2.2), U(t,¢,u) = 9(t,f, 4) + h(E) A(t, &, w)/(1 + fe’), where 
fort = 0, H(t, ¢, uw) = e'a(G(t, ¢, w), —t) + a(G(t, ¢, w), t) — 1. H depends 
on ¢ and yw only through g. By Lemma 2.6, 9, and hence U, is continuous over 
A, . Thus, using (2.10) and the lower bound of Lemma 2.7, it may be shown 
that for every positive ¢ and yw, there exists a positive number k, such that 
Osts (fo, 4) +k = 0aU/d¢ 2 0,¢ S 1. It then follows that for any fixed 
up > 0, t* is strictly monotone to either side of a minimum at ¢ = 1. For suppose 
not, then there exist ¢’, ¢” such that eitherO0 < ¢’ <¢” Slorls it” <i’ < ~ 


such that (*(¢”, w) 2 t*(¢%’, uw), and hence we would have that 
0 = U(t(t', w), 8,4) < Ul’, w), ow) S UCR", w), £", w) = 9, 


a contradiction. In a similar way, it may be shown that for every fixed ¢ > 0, 
t*(¢, w) is strictly increasing with ». To complete the proof of part 2, it is sufficient 
to prove that for any fixed positive 6 < 1, 


(2.13) t*(1,4) > (1 — 6) Ing = &(p), u sufficiently large. 
To show this, we need only show that 
U(ts(w), 1, w) < 0, u sufficiently large. 


Using Lemma 2.8 and a well known inequality on Mill’s ratio [1], we find that 
lim,.« H(ts(u), 1,4) = —1. Hence, applying Lemma 2.8 once more, we find that 


lity (U (ts(w), 1, @)/ In pw] = 2(1 + V5) — lim, [u/(1 + p”)) =—o2, 


This completes the proof of part 2. Continuity may be proved using a device 
employed by Wald and Wolfowitz in [6]. Let ¢, u be arbitrary, fixed, positive. 
Let K, , Kz be any two numbers such that 


0 < K, < (*(¢, uw) < Ke, K, — K, < A, 
where A is an arbitrarily small positive number. By (2.10), 
U(K,, , us) <0 < U(K2, f, uw). 


Let At, Au be non zero increments in ¢ and y, respectively, which tend to zero 
with A, and such that ¢ + Aft > 0, uw + Ay > O, then, since U is a continuous 
function of its arguments, we have, for A sufficiently small, that 


U(K,,¢ + Af, w+ Ap) < 0 < U(K2,f + Af, w + Ap). 


Hence, for A sufficiently small, K, < t*(¢ + At, u + Ap) < K,. This completes 
the proof. 


3. Bayes rules with preassigned invariant error probabilities. Below, we con- 
sider the error probabilities associated with the Bayes rules, S*(g, W, M), in 
C,, in terms of their dependence on the Bayes parameters and the first sample 
size m. (a, the distance between the means of fy and f; is always regarded as 
arbitrary, fixed, positive.) The properties developed lead to sufficient conditions 





PROPERTIES OF BAYES TWO-STAGE TESTS 341 


under which the parameters, W and M, may be chosen, for arbitrary, positive 
m and g, so that the error probabilities take on preassigned, fixed values. This 
leads to a class of rules, parametrized hy m and g and the preassigned error 
probabilities, each member of which minimizes the average (g) expected number 
of observations among all rules in C,, with error probabilities less than or equal 
to those preassigned. It is pointed out how rules, within the same subclass, 
which minimize the maximum expected sample size, may be obtained by proper 
selection of g. 

Let p(t | z,d) = (4/202) exp {— [t + (4 — i)z + In A}*/2z}. By the results 
obtained in the preceding section, and in particular, by (2.9), we may write 
the expected overall sample size, under f;, required by the Bayes rule, 
S*(g, W, M), in the form 


(3.1) &,(S*(g, W, M)) = a6" (a'm, gW, W,a’M), 


where 
ef(z,d,f,n) = 2+ [ y*(t,f,u)pdt|z,r) dt, 2, fu > 0, 


&:(0,A,¢, 0) = y*(—Ind, fu), Ata > 0. 


Similarly, the probability, under f;, that S*(g, W, M) will lead to decision 
1 — i, may be written in the form 


(3.2) Q,(S*(g,W, M)) = QT (a’m, gW, W,a’M), 


where 


x 


Qi (z,d, fw) -_ | a(y*(t, f,u,), 1 — 2i)t)p,(t| z,d) dt, z,r,f,4 > 0, 


(3.3) QT(O, A, £, u) = a(y*(— Ind, £, w), (2i — 1) mA), Ayo, uw > O. 
Note that the symmetry, (2.7), implies the corresponding symmetries 
(3.4) 82 (2,A,5,m) = G0(z,1/A,1/f,m), QT (2,9, $m) = QS(z, 1/A, I/f, w) 


These imply, as a special case, that the rules, S*(1, 1, 4) are minimax in C,, with 
respect to wrong decision losses that are both equal to M. This fact was noted 
by Wald ([7], pp. 151-156) for the case, M = 1. 
Lema 3.1. Let 6 be an arbitrary, fixed, positive number less than one, then 
lim, .. Qo (z, , ¢, u) = 0, uniformly for all z, A, ¢ such that z 2 0,¢ > 0, 2 6. 
Proor. By (3.3), (2.3), (2.11), whenever z, A, £, w > 0, 


Q2(z,r, fu) - | po(t | z,) dt 
t* (fs) 


(3.5) t* (fm) 
+ [ (A(t, £, 4) )polt | zr) dt, 


—t*(1/fs) 


where A(t, ¢, uw) = .5(9(t, ¢, w))* — t/( Ht, ¢, z))’. By (2.13), for » sufficiently 





342 MORRIS SKIBINSKY 


large (uniformly for all z, A, ¢ > 0), the right hand side of (3.5) is bounded 
above by 


(1—6) Ing 


, (h(t, £, w))po(t | z, X) dt. 


[pelt 2,a) at + 
(1—8) Ing 


—et*(1/f, 


For yu sufficiently large (uniformly for all z, ¢ > 0, 2 4, the first of these terms 
is bounded above by ¢((2[(1 — 6) Inw + In 8])') which tends to zero as 
u — ©. On the other hand, by Lemma 2.7, the second term is, for u sufficiently 
large (uniformly for all z, 4, ¢ > 0), bounded above by 


(3.6) ¢(.5>/GG0, uw) — (1 — 8) Inu/VGQG,»))). 


By Lemma 2.3, lim,... [(G@(@(1, 4))/ In w] = 2. Thus, the argument of ¢ in (3.6) 
is, for large u, asymptotically equivalent to (4 In »)*. Hence (3.6) tends to zero 
asywpror @, 

Finally, for u sufficiently large (uniformly for all ¢ > 0, = 8), Q3(0, A, ¢, #) 
is also bounded above by (3.6). This completes the proof of the lemma. 

Define Q?(z, 4, ¢,0) = lim,o Q? (z, A, f, #), 2 2 0,A,¢ > O. Note that the 
symmetry of (3.4) continues to hold in the limit as p — 0. By (3.5) and Lemma 
2.9, 


(3.7) Q(z, d, ¢,0) = o(.5Vz + Ind/v2), z,r4,¢ > 0. 


Observe that the above expression is independent of ¢ > 0 and is a continuous 
function of positive z and \. For z = 0, we have by (3.3) and (2.3), that 
QF (0, 4,0) = 1, .5, or 0, according as \ <, =, or >1. 

LemMa 3.2 Let K be an arbitrary, fixed, positive number, then 


limy..o Qo (z, 4, fw) = 1, limy.« Q(z, A, ¢, w) = 0, 


uniformly for all z, ¢, w such thatO S z,un S K,¢ > 0. 

Proor. When » = 0, it is obvious from the definition of Q¢(z, \, ¢, 0) that 
the limits hold uniformly for 0 s z s K, ¢ > 0. When z = 0, it follows from 
(3.3), (2.3), (2.11), that the limits hold uniformly for ¢ > 0,0 < ws K. Finally, 
for any z, A, ¢, «such that 0 < z,u S K;A,¢ > 0, we have by (3.5), that 


/ po(t | z,r) dt < Q3(z,d, fu) < | po(t | z,r) dt. 
t*(fm) —t*(1/S.m) 
For sufficiently small \ > 0, the left hand side of the above inequality is bounded 
below by $(.5>/K + [Ind + ¢*(¢, K)]/-/K). Since, by Lemma 2.9, t*(¢, K) 
is bounded, this lower bound tends to one, as \ — 0, uniformly for all ¢ > 0. 
On the other hand, for sufficiently large A, the right hand side of the inequality is 
bounded above by ¢({In \ — ¢*(1/¢,K)]/+/K), which tends to zero, as \ > ~, 
uniformly for all ¢ > 0. This completes the proof. 

For any positive r and z, the equation, 


(3.8) Qo (z, A, ¢, 0) = rQF(z, A, ¢, 0), 





PROPERTIES OF BAYES TWO-STAGE TESTS 343 


has, by (3.7), (3.4), a unique positive root in A. We shall denote, by é(r, z), the 
value common to both sides of (3.8), when \ is equal to this root. Recall that 
both sides of the above equation are independent of ¢. In the following lemma, 
we state, without proof, several properties of the function, £. 

Lemma 3.3. &(r, z) is 


(1) for fixed z > 0, a strictly increasing function of r > 0. 

(2) for fixed r > 0, a strictly decreasing function of z > 0. 

(3) a continuous function of positive r and z. 

(4) limo &(r, 2) = 0, &(1, 2) = o(.5~/2), lim,.. (7, z) = 1. 
(5) lim,.o E(r, z) = r/(1 + 1), lim,.. E(r, z) = 0. 


We shall require, finally, for the proof of our theorem, the use of a lemma 
which is proved by T. Rado and P. V. Reichelderfer in (5), Lemma 16, p. 390. 
The lemma is paraphrased, below, 

Lemma 3.4. Given 


(a) A bounded, simply connected Jordan region, J, in the complex plane. 

(b) The arbitrarily oriented boundary curve, c, of J. 

(c) A continuous, real or complex valued function, s(w) in J which is different 
from zero in J. 


Then V. Argument s(w) = 0, i.e. the variation in the argument of s(w) on c is 
zero. 


TuHeorem 3.1. Let 6 be an arbitrary positive number less than 1, and define the 
set 


= {(8,r, 2, y): 0<6 8 &r,z),7r >0,2>0,8 <7 < 1/8, 


Then for each point, @ = (8, r, z, y) in Aj, there exist numbers ¢*(0) > 0 and 
u*(0) = O such that 


Q3 (z, vf*(0), c*(6), u*(6)) = rQi (2, ¥i*(8), o*(6), u*(0)) - 


Proor. For any point in A; such that 8 = &(r, z), the conclusion is obvious. 

Let (8, r, z, y) be an arbitrary, but fixed point in A, such that 6 < &(r, z). 
For this proof only, the letter 7 will be used to denote the imaginary unit. We 
define the complex variable, w = ¢ + iv, and the complex function of this com- 
plex variable, s(w) = &(w) + is,;(w), where 


&(w) = Q5(z, vf, f, uw) — rQT(z, vf, fw), 


and 3,(w) = Q3 (z, vf, €, #) — 8. Since Lemma 3.1 is obviously true with its 6 
replaced by the square of the 6 of the present theorem, we have by that lemma 
and by the symmetry (3.4), that there exists a positive value of yu, wu; , say, such 
that 


‘ { 8, ( w) < 0, 
ve « Vaile) < s(w), 





344 MORRIS SKIBINSKY 


By Lemma 3.2, taking K = yu; , and by the symmetry (3.4) there exists a positive 
value of ¢, ¢s , say, which is less than one, such that 


0O<¢ S fs => &(w) > 0, 
(3.10) forall y,OSus mum. 
Ii/fs 35 < © = &(w) < 0, 


Let c;, j = 1, 2, 3, 4, denote the line segments with complex endpoints, as 
follows. 


C1: fs + tus, fs C2: $5, 1/fs 
Cy 21/5, 1/fs + tus Cy: 1/Ss + tua, Ss + tus. 


The rectangle c composed of these four line segments is the boundary of a simply 
connected Jordan region in the complex w plane. By (3.10), so(w), which is the 
real part of s(w), is positive everywhere on ¢ , including the endpoints of this 
line segment. Thus, the image of c,; under s lies entirely to the right of the imagi- 
nary axis. Similarly, by (3.10), the image of c; under s lies entirely to the left 
of the imaginary axis. As w moves on ¢ , from ¢; to 1/f; , by (3.7) and the sym- 
metry (3.4), 8(w) decreases monotonically from its positive value at w = ¢; 
to its negative value at w = 1/f; , taking on the value zero precisely once at the 
value of ¢ for which yf is equal to the unique root in A of the equation (3.8). 
Since s,(w), which is the imaginary part of s(w) is equal to é(r, z) — 8, at this 
point, and since 8 < é(r, z), the image of c. crosses the imaginary axis precisely 
once, at a point above the real axis. On the other hand, by (3.5), the symmetry 
(3.4), and the lemmas of section 2, so(w) is a continuous function of ¢ every- 
where on c,. Hence, it follows that as w moves from 1/f; + tus to ¢; + ius, on 
C4, 8o(w), starting negative and ending positive, must take on the value zero at 
least once. By (3.9), however, each time that it does, s,;(w) must be negative. 
Thus, the image of c, under s, must cross the imaginary axis an odd number of 
times and each time it does so, the crossing must be made below the real axis. 
It is evident, that as w describes a path about c in either direction and returns 
to the initial point, that the argument of s(w) must increase or decrease by 27. 
But this contradicts the conclusion of Lemma 3.4. Hence s(w) must have at 
least one zero inside of c. This proves the theorem. 
We remark that by (3.3), (2.3), (2.11), 


(3.11) Q?(0, 1,1, w) = o(4V9(0, 1, u)). 


Hence, by Lemmas 2.6, 2.7, the conclusion of the above theorem holds also for 
the points @ = (8, 1, 0,1), withO <6 s }. 

Corouuary 3.1. Let 6* = (80, Bo/Bi, a’'m, g), where a and g are arbitrary, but 
fixed, positive numbers, m is a positive first sample size, and By and 8, are any two 
preassigned numbers such that 


0 < Bo S E(Bo/f:, am), 0 <6 <1. 





PROPERTIES OF BAYES TWO-STAGE TESTS 


Let W* = ¢*(0*), M* = a *y*(6*), then 
(3.12) Q.( S*(g, W*, M*)) = Bi, 
and if S is any rule in C,, such that 


(3.13) Q(S) 3 Bi, 
then 


1 1 
2X 98 S*(9, w*, M*)) s 2X 96:(S). 


Proor. The first conclusion follows immediately from the theorem and (3.2). 
The second conclusion follows from Lemma 1.1. 


By the remark concerning (3.11), the corollary may be extended to a zero 
first sample size when §o/8; = g = 1. 

If we can now choose g so that &(S*(g, W*, M*)) = &(S*(g, W*, M*)), the 
resulting rule will minimize the maximum expected overall sample size among 
all rules in C,, with error probabilities less than or equal to the ones which it 
possesses. 

For example, if we take 85 = 6, = 8, say, where 


(3.14) 0<6S &1,a’m) = o(4aV/m) S }, 


and we choose g = 1, then by (3.4), we may take ¢*(8, 1, a’'m, 1) = 1, and 
Lemma 3.1, (3.7), and (3.11) will ensure the existence of the corresponding 
value u*(8, 1, a’m, 1). Thus, if we let 

(3.15) S(a, 8, m) = S*(1, 1, a *y*(B, 1, a’m, 1)), 


we have, subject to (3.14), that Qo/ S(a, 8, m)) = Q;( S(a, 8, m)) = 8. But 
in addition, by (3.1), (3.4), 


(3.16) Eo S(a, B,m)) = &( S(a, B,m)) = a “8(8, a’m), say. 

Hence, by the corollary and the remark which follows Theorem 3.1, S(a, 8, m) 
has, for any non-negative first sample size m, the property that 

(3.17) maximo, &:(S(a, 8,m)) S max,o,1 &(S), 

for all S in C,, such that 


(3.18) QA(S) s B, i= 0, 1. 


The selection of an optimum first sample size for this example is considered 
in Section 4. 

If it were possible to choose W* and M* so as to satisfy (3.12) and in addition 
also to satisfy the requirement that &,( S*(g, W*, M*)), be independent of g, the 
resulting rules would minimize the expected overall sample size simultaneously 
under both densities (2.1), among all rules in C,, which satisfy (3.13). It is con- 
iectured that in the present case, this requirement is impossible to fulfill. As 





346 MORRIS SKIBINSKY 


shown by Wald and Wolfowitz in [6], the analogous result in the sequential case 
is actually achieved by the sequential probability ratio test. 

In conclusion, we mark that the above results do not overlap with those of [2]. 
In [2], a Bayes rule in C,, is found for testing the composite hypothesis that the 
mean of a normal distribution with known variance is positive against the com- 
posite hypothesis that it is less than or equal to zero. The prior distribution of the 
mean is taken to be its fiducial distribution based on the outcome of the first 
sample. The loss is taken negatively proportional to the mean when a correct 
choice is made, and zero otherwise. Cost is proportional to the number of observa- 
tions. The solution is shown to be admissible with respect to the loss function 
chosen. 


4. Two-stage rules which minimize total expected sample size. Continuing 
our example of the preceding section, we have by (3.16), (3.1) that 


(4.1) &(B, z) = &7(z, 1, 1, u*(B, 1, z, 1)). 


Using the expressions which follow (3.1), we find that 


e* (1m) 


(4.2) 6? (z,1, 1,4) =2+ Vixh | I(t, fw) dt, z,u > 0, 
0 


where I(t, z, u) = §(t, 1, u) cosh (t/2) exp (—(2/8) — (#°/2z)]; 
(4.3) 6:(0, 1, 1,4) = 9(0, 1, »), uw > 0; 
(4.4) &t(z,1,1,0) = lim 8? (z, 1,1, 2) = z, z>o0. 


wed 


In a similar way we may find simplified expressions for Q?(z, 1, 1, u). 

To motivate the lemma which follows and with reference to (3.14) we note 
that by part 4 of Lemma 3.3, the inequalities 0 < 8 < &(1,z) S 4 are equiva- 
lent to the inequalities 


(4.5) Os254%, 0<63}, 


where Xz is the unique root of the equation (A) = 8. 
Lema 4.1. &(8,0) = &(8, 445) = 405. 
Proor. By (3.7) 


Q3(z, 1, 1,0) = o(4-v2). 
Hence we may take u*(8, 1, 443, 1) = 0. Thus, by (4.1), (4.4), 
&(8, 445) = 403. 
On the other hand, by (3.3), (2.3), 
Qi(0, 1,1, #) = 6(4-V9(0, 1, )). 
Hence, by (4.1), (4.3), 
&(6,0) = g(0, 1, u*(B, 1,0, 1)) = 45. 





PROPERTIES OF BAYES TWO-STAGE TESTS 347 


THEOREM 4.1. 


1. For each a > 0 and each 8, 0 < 8B & 4, there exists a zero or positive integral 
value of m < 4a™*)j, call it m(a, 8), such that the rule 


(4.6) S(a, 8, m(a, B)) = S*(1, 1, a *p*(B, 1, a’mh(a, B), 1)) 


minimizes the maximum expected total sample size among all two-stage rules 8S, 
with integral first sample size, which satisfy (3.18). 

2. Whenever a = 2d4 , we may take m(a, 8) = 0. 

Proor. Clearly, if a’m = 4)3, then any two-stage rule in C,, will have, under 
either hypothesis, an expected total sample size 24a °dj . By (3.16), (3.17), and 
Lemma 4.1, we may thus restrict our consideration to the rules S(a, 8, m), with 
a’m < 4)}. Since only finitely many multiples of a’ are bounded above by 4}, 
part 1 of the theorem follows. Part 2 is immediate. 

It is of interest to note that by (2.9), (3.1) and Lemma 4.1, the second sample 
size specified by (4.6) when mh(a, 8) = 0 is 4a-°d} , which (rounded to the follow- 
ing integer ) is the sample size of the corresponding optimum one-stage procedure. 
Thus, the optimum one and two-stage rules are identical whenever a 2 2),, 
0<6 Ss}. 

For fixed 8, &(8, z) appears to be continuous in z over the interval [0, 443] and 
monotonic to either side of a unique minimizing z(= 2(8), say. See Tables I 
and II. 


TABLE I 
B = 05 


&(0.5, 2) t*(1, we”) (1, *) 


10.8222 
10.7742 2.2106 2.4618 
10.5598 2.2285 2.4809 
10.1762 2.2405 2.4936 
9.7148 2.2356 2.4884 
8.8159 2.1778 2.4271 
8.1567 2.0604 2.3120 
7.8222 
7.7774 
7.7770 


egE8 


SSRN SMAAAAAHR WN OS 
BEESSEE 


> 

o 
o- 
&5 


se 
t 





MORRIS SKIBINSKY 


TABLE II 
p=. 


w &(.01, 2) tr, w*) 


698 . 285 21.6476 


657 . 699 16.2500 3.5699 

526.323 14.5568 3.3815 

385.531 13.8947 3.1198 

360.422 13.8787 3.0634 

355.874 13.8793 3.0528 
12. 262.659 14.1471 2.7999 
14. 167.700 15.0849 


4, = 21.6476 0. 21.6476 0. 


Let us assume the existence of this minimum and the monotonicity and con- 
tinuity of & to either side of it. It then follows that *(a, 8) must be the integer 
either immediately preceding or immediately following a “2(8), when this latter 
is non-integral; otherwise, *(a, 8) = a °2(8). Also it follows that 
(4.7) lim a’h(a, 8) 2(8) = 1. 

The computations seem also to indicate that u*(8, 1, z, 1) is continuous in z 
over the above interval and assuming this is true it follows that 
lim 4*(8, 1, a'mh(a, 8), 1) = u*(B, 1, 2(8), 1). 

It should be noted that the existence of an optimum integral first sample size 
m(a, 8) rests entirely upon Theorem 4.1 and does not depend upon the above 
assumptions which are of computational origin. These assumptions, by virtue 
of the implications which proceed from them, allow us as indicated below to 
approximate the rule (4.6) by the rule (4.8) when a is small, and otherwise 
enable us to locate *(a, 8) with reference to a “2(8) by only two computations 
of &. Without these assumptions, a finite number of computations of & would be 
required to make certain that the desired minimum had been attained. 

In Tables I and II, for 8 = .05 and .01, respectively, we have given values of 
the functions u*, &, t*, and é associated with the rules S*(1, 1, a “¢*(, 1, z, 1)) 
for selected z in the interval (0, 43]. All entries are rounded in the last place 
given. More extensive calculations not tabulated here ensure that despite the 
relative insensitivity of & in a neighborhood of the minimizing argument 2(8), 
the values of 2(.05) and 2(.01) are accurate to the first three decimal places with 
perhaps a slight error in the fourth. The abbreviation u* = y*(6, 1, z, 1) is em- 
ployed in the table headings. 

In Table III, we have given values of *(a, 8) for some values of a 24 and 
for 8 = .05 and .O1. 





PROPERTIES OF BAYES TWO-STAGE TESTS 


TABLE III 
m(a, .05) a m(a, 01) 


> 4.6527 0 
( 4. 1 
1 3. 1 
1 2. 3 
6 1. 10 
22 0.5 41 


If we admit any non-negative real number as a first sample size (We have 
already done this for second sample sizes), then clearly 


(4.8) S*(1, 1, a “u*(B, 1, 2(8), 1)) 


with first sample size a “(8) possesses the property attributed to (4.6) in the 
wider class of rules where non-integral first sample sizes are allowed. The ratio 
of the expected total sample size of (4.8) to the sample size of the corresponding 
optimum one-stage procedure is &(8, 2(8) )/4N5 = .7186, 6411, for 8 = .05 and 
.O1 respectively. The first of these figures is a clear improvement upon .7569, 
the corresponding ratio for an intuitive two-stage rule proposed by Owen [4]. 


TABLE IV 
B= 05 
H(t, 1, **) §(t, 1, e**) H(t, 1, w**) 


.7345 , 6.4300 
.6963 ' ». 3516 
6564 ; 2710 
.6149 ‘ }. 1883 
.5718 j 5. 1034 
. 5270 , ). 0161 
4806 486 9264 
-4326 j 5.8342 
. 3829 . 7393 
.3315 ; 5.6416 
2785 : 5.5409 
. 2238 j2N6 5.4370 
.1674 ’ 5.3297 
1093 5.2187 
0495 ‘ 5.1036 
). 9879 ; 9840 
. 9246 ; BAGS 
8595 ; 7291 
.7927 t*(1,yu**) = b §922 
.7239 

). 6533 

5. 5809 

5064 


.1655 0.6573 
. 1647 0. 6858 
1622 0.7144 
1581 0.7430 
1524 0.7716 
1451 0.8002 


i? 


sas 


sy *) *J 


. 1361 . 8287 
1255 8573 
.1132 8859 
0994 9145 
0839 .9430 
0667 .9716 
.0480 0002 
0276 0288 
0056 .0573 
9820 0859 
9567 1145 
9298 .1431 
.9013 .1716 
8712 2002 
8395 2288 
8061 

771i 2860 


sas 


a 


uO OD OrmDnrnrnmnmnnmaem 
Seeeaacs+: 


io 





MORRIS SKIBINSKY 


TABLE V 
B=: 


t y(t, 1, w**) t H(t, 1, w**) 


. 1009 16.0875 2.2018 12.8353 
. 1488 15.9863 2.2497 12.6498 
. 1967 15.8812 2.2976 12.4603 
2445 15.7724 2.3454 12.2665 
. 2924 15.6598 2.3933 12.0683 
3403 15.5435 2.4412 11.8655 
. 3881 15.4235 2.4890 11.6579 
4360 15.2999 2.5369 11.4452 
4839 15.1726 2.5848 11.2271 
5317 15.0418 2.6326 11.0033 
5796 14.9074 2.6805 10.7732 
6275 . 7694 2.7284 

.6753 -6279 2.7762 

. 7232 .4829 2.8241 

7711 3343 2.8720 

8189 . 1822 2.9198 

- 8668 -0266 2.9677 

9147 3.8674 3.0156 

0.8616 16.5345 9625 3.7046 ) 3.0634 

0.9095 16.4531 0104 3.5382 

0.9573 16.3677 2.0583 3.3681 

1.0052 16.2782 2.1061 13.1943 

1.0531 16.1848 2.1540 13.0167 t(l,e**) = 3.3621 5.0153 


0. 

0.0479 

0.0957 

0.1436 

0.1915 

0.2393 

0.2872 

0.3351 

0.3829 

0.4308 

0.4787 

0.5265 

0.5744 

0.6223 

0.6701 

0.7180 16.7539 
0.7659 16.6850 
0.8137 16.6118 


BO 


For small values of a (say $4), we may with an error at most unity in the 
first sample size and relatively small error in the second sample size function 
employ (4.8) as a substitute for (4.6), taking first and second sample sizes to the 
nearest integer. Due to lack of space, the only second sample size functions which 
are tabulated here are those which correspond to the rules (4.8) for 6 = .05 
and .01 (Tables IV and V, respectively). For convenience, we use the abbrevi- 
ation u** = u*(8, 1, 2(8), 1). Recall that by (2.7), 9(t, 1, u) is symmetric about 
t = 0. By (2.9), (2.11) the second sample size specified by (4.8) corresponding 


to a first sample with mean X is 


a “g(a ‘2(B)X,1,u%*), if |X| < at*(1, w**)/2(8), 


and zero, otherwise. 

A program for use on the Datatron prepared by the author with the aid of the 
Purdue Compiler [3] is available (routine library of computing laboratory at 
Purdue) for calculation of & for any z and 8 which satisfy (4.5) as well as for 
any of the auxiliary functions tabulated here. The individual computations of 
which the program is composed may be determined directly from the definitions 
of the functions tabulated. Once m(a, 8) has been determined or approximated 
by a 2(8), computations of corresponding optimum second sample size values 





PROPERTIES OF BAYES TWO-STAGE ‘TESTS 351 


io the degree of accuracy attained in Tables IV and V require about fifteen 
minutes of datatron time. Location of 2(8) may take several hours or more de- 
pending on the accuracy desired. If only *(a, 8) is wanted, for particular argu- 
ments, the time may be considerably reduced. For example, m(2, .05) would 
require only two non-trivial calculations of &. 

Let us now consider a specific example with a = .1, 6 = .05. Using the rule 
(4.8), we take a first sample of 100 X 5.5393 = 554 observations. If | Rus | 
> .1-1,.8289/5.5393 = .0330, we take no additional observations and choose 
fo or f; according as Xu, is < or > O. If | Xu | < .0330, we take 1009 
(10-5.5393. Xs, 1, u**) additional observations. For example, if Xs, came out 
equal to .0191, we would (using Table IV) take 705 additional observations. We 
would then choose fy orf; according as the overall mean of both samples is neg- 
ative or positive. We toss a coin to decide, if the overall mean equals zero. 

The optimum rule outlined above has an expected total sample size of 777.69 
(a possible error in the decimal may exist due to the use of integral sample 
sizes). The rule requires a minimum of 554 observations (when no second sample 
is required), and can call for at most 1082 observations in the second sample. 
Under either hypothesis, Prob {| Xs | < .0330} = .3192, i.e. there is less than 
one chance in three that the rule will require a second sample. 1082 observations 
would be required by the corresponding one-stage rule. 


REFERENCES 


{1} R. D. Gorpon, ‘‘Value of Mills ratio of area to bounding ordinate of the normal proba- 
bility integral for large values of the argument,’’ Ann. Math. Stat., Vol. 12 
(1941), pp. 364-6. 

(2) P. M. Grunpy, M. J. R. Heaty, anp D. H. Regs, “Economic choice of the amount of 
experimentation,’’ J. Roy. Stat. Soc., Series B, Vol. 18 (1956), pp. 32-55. 

{3} Sytvia Orcet, Purdue Compiler, General Description, Purdue Research Foundation, 
July, 1958. 

[4] Donaxp B. Owen, “A double sample test procedure,’’ Ann. Math. Stat., Vol. 24 (1953), 
pp. 449-457. 

{5) T. Rapo, ano P. V. Reicue perrer, Continuous Transformations in Analysis, Springer- 
Verlag, Berlin, 1955. 

[6] A. Wap anp J. Wo.rowitz, “Optimum character of the sequential probability ratio 
test,’” Ann. Math. Stat., Vol. 19 (1948), pp. 326-39. 

(7] ABRanam WaLp, Statistical Decision Functions, John Wiley and Sons, New York, 
1950. 





LOWER BOUNDS FOR THE EXPECTED SAMPLE SIZE AND THE 
AVERAGE RISK OF A SEQUENTIAL PROCEDURE' 


By Wassitty Hoerrpinc 
University of North Carolina 


Summary. Sections 1-6 are concerned with lower bounds for the expected 
sample size, Eo(N), of an arbitrary sequential test whose error probabilities at 
two parameter points, 6, and 6 , do not exceed given numbers, a; and a, , where 
Eo(N) is evaluated at a third parameter point, %. The bounds in (1.3) and 
(1.4) are shown to be attainable or nearly attainable in certain cases where 4 
lies between 6, and @, . In Section 7 lower bounds for the average risk of a general 
sequential procedure are obtained. In Section 8 these bounds are used to derive 
further lower bounds for Zo(N) which in general are better than (1.3). 


1. Introduction and main results. Let X,, X2, --- be a sequence of independent 
random variables having a common probability density f with respect to a 
o-finite measure yp. One of two decisions, d,; and d, , is to be made. Let f, and f. 
be two probability densities such that decision d, (d,) is considered as wrong if 
f = fil(fe). We shall consider sequential tests (decision rules) for making decision 
d, or d, , such that the probability of a wrong decision does not exceed a positive 
number a; when f = f; (¢ = 1, 2). Let N denote the (random) number of ob- 
servations required by such a test. This paper is mainly concerned with lower 
bounds for Eo(N), the expected sample size when f = fy, where fy is in general 
different from f; and f2 . 

The background of this problem is as follows. Suppose that f depends on a 
real parameter @ and f; corresponds to the value 6;, where @, < #. Suppose 
further that decision d, or d; is preferred according as @ S 6, or 6 = &, and 
that neither decision is strongly preferred if 6, < @ < 6. If we require that the 
probability of a wrong decision does not exceed a (az) if @ S 0 (@ 2 %&), the 
condition of the preceding paragraph will be satisfied. (In many important cases 
a test which satisfies the latter condition also satisfies the former.) It is known 
[14] that Wald’s sequential probability ratio (SPR) test for testing 6, against 
6, , with error probabilities equal to a; and a: , minimizes the expected sample 
size at these two parameter values. In typical cases its expected sample size is 
largest when @ is between 6, and 6 (that is, when neither decision is strongly 


Received September 2, 1959. 

' This research was supported by the United States Air Force through the Air Foree 
Office of Scientific Reserach of the Air Research and Development Command, under Con- 
tract No. AF 49(638)-261. Reproduction in whole or in part is permitted for any purpose 
of the United States Government. 

Part of this work was done while the author was a visiting professor at Stanford Uni 
versity. 


352 





SEQUENTIAL SAMPLE SIZE AND RISK 353 


preferred ), and in general there exist tests whose expected sample size at these 
intermediate @ values is smaller than that of the SPR test. (A special case in 
which a SPR test minimizes the maximum expected sample size will be discussed 
in Section 4.) In principle it is possible to construct a test which minimizes the 
expected sample size at an arbitrary @ value or minimizes the maximum expected 
sample size. Kiefer and Weiss [7] have proved important qualitative properties 
of such tests. The actual construction, however, of a test having this property, 
as well as the evaluation of its expected sample size and its error probabilities, 
meets with difficulties which have not been overcome so far (except for a few 
special cases). Therefore attempts have been made to find a test which, without 
actually minimizing the maximum expected sample size, comes close to this 
goal, or at least substantially improves upon the performance of known tests. 
I mention in particular the work of Donnelly [5] and T. W. Anderson [1] who, 
independently of each other, considered a class of tests such that, if @ is the 
mean of a normal distribution, the boundaries for the cumulative sums are not 
parallel lines, as in the SPR test, but converging straight lines. (Anderson also 
considered truncated tests of this type.) The performance of these and other 
tests can, to some extent, be judged by comparing, at any parameter point @, 
the expected sample size of the test with the smallest expected sample size 
attainable by any test having the same error probabilities at 6, and # . In the 
ignorance of the minimum expected sample size, the comparison may be made 
with a lower bound for this minimum. If the discrepancy is small, both the test 
(as judged by this criterion) and the bound cannot be greatly improved. Our 
main concern will be with bounds which are best when @ is between 6, and 6. 

We admit arbitrary (in general, randomized) sequential tests which terminate 
with probability one under each of fo, f; and fe. We also assume, with no loss 
of generality, that Ey(N) < «. To exclude trivialities, we suppose that a, + 
Qe < 1. 

The first lower bound for the expected sample size was given by Wald ((11], 
p. 156) who proved, for the case fo = f, , that 


E,(N) > a log (a,/(1 - a) ) + ai o- a) log ((1 - a) ‘a) 
41i\4 a - -_ — eon 





(1.1) 
1d | fillog (fx/fe)) dy 


and an analogous inequality for fo = fe . (Wald’s proof assumes a nonrandomized 
test, but this restriction is easy to remove.) Both the numerator and the denomi- 
nator in (1.1) are positive (since log z > 1 — x" for z > O unless z = 1); the 
integral in the denominator can be equal to + , in which case the lower bound 
has the trivial value 0. The sign of equality in (1.1) can be attained with a 
SPR test in the case where the ratio f,/f. takes on the two values C and 1/C 
only, provided that the values a; and a; can be achieved as error probabilities 
in this test. In certain other cases the sign of equality can be nearly attained 
with a SPR test. 





354 WASSILY HOEFFDING 


The following extension of (1.1) to the case of an arbitrary fy has been given 
by the author [6]: 


EN) = sup ———_0S laid = 2n)™ + (1 = on)for“) 
cf follow (fu/h)) du + (1 =e) f fo(log (fo/fs)) du 


(1.2) 0<e<1 


For fo = f; and c — 1, (1.2) reduces to (1.1). This bound is likely to be close 
when fo is close to f; or fe. 
In this paper two new inequalities will be proved, 


BIN) 2 


1) Ys J min (fo Sr Sa) dw 


and 


3 4 2 
(1.4) E\N) > NG AY =f tog fon + onl = o/h 


where 


(1.5) ¢$=max(fi,f2), f= | solto (fo/fi)) du, 


and 


(1.6) r= / (log (fe/f:) — t: + £2)fo du. 


Note that ¢; 2 0, and ¢; > 0 if fo and f; are densities of different distributions. 
In the proof of (1.4) it will be assumed that, in addition to the existence of 
the integrals in (1.5) and (1.6), 


(1.7) fo(x) = 0 implies min [f,(z), fo(x)] = 0, 


and that the equation 


N 
(1.8) E> Y;)* = 7 EAN), 
j=l 
is satisfied, where 


fuX;) 
fi( X53) 


Concerning the last assumption we note that {; — f = f follog (f2/f,)] du 
so that Eo(Y;) = 0 and, by (1.6), Eo( Y¥*) = 7°. Equation (1.8) has been proved 
by Wald [9] and Wolfowitz [15] under certain conditions; see also Seitz and 
Winkelbauer [8]. It certainly holds if N is bounded or if Y; + --- + Y,, is 
bounded for m < N. It is clear that, if condition (1.8) is satisfied for a test 


r 


which minimizes Eo(N), then inequality (1.4) is true also for any other test. In 


(1.9) Y; = log —fith. 





SEQUENTIAL SAMPLE SIZE AND RISK 355 


particular this is true under the assumptions of Theorem 4 of Kiefer and Weiss 
(7], which imply that, if a test minimizes Zo(N), then N is bounded. 

Inequalities (1.3) and (1.4) will be proved and discussed in the following 
sections. Here we mention only the conditions for the attainment or near- 
attainment of equality. In inequality (1.3) the sign of equality holds under 
certain conditions which are typified by the following two cases. In the first 
case the densities f; are arbitrary except that fo(z) 2 min [f,(z). fo(xz)}, but 
a, and a, are restricted to values which are attainable with a test which requires 
at most one observation, z,, and decision d;(d,) is made if fi(a,) — fe(a) > 
0 (<0). In the second case the f; are rectangular densities on intervals of com- 
mon length such that the mean of fo is between the means of f; and f.. Then 
equality in (1.3) is attained with a version of the SPR test for arbitrary values 
a and Ge. 

In (1.4) strict equality is not attainable except in trivial cases. If fo, f; and 
f: are normal probability densities with variance 1 and respective means 6 = 0, 
—é and 4, then for a; = a, = a < 4 equality in (1.4) is nearly attained with a 
fixed sample size test if a is very small and with a SPR test if a is sufficiently 
large. For a = 0.05 and a = 0.01, the expected sample size at @ = 0 of a test 
considered by Anderson [1] comes remarkably close to the lower bound in (1.4). 

Lower bounds for the average risk of a general sequential procedure and 


resulting improvements of inequality (1.3) are stated and proved in Sections 
7 and 8. 


2. Some lemmas. A randomized sequential test for deciding between d, and 
d, , based on the sequence X, , X;, --- , can be characterized by two sequences 
of random variables, yo, 41, ¥2, --* and ¢o, d:, d:, «-* such that y, 2 0, 
v+nw+ve+ --- S$ 1,0 S ¢, S 1, and both y, and ¢, are functions of X,, 
--+ |X, only; Yo and ¢» are constants. Here ¥, denotes the probability of N = n, 
under the condition that the values X,, --- , X, have been observed, where N 
is the number of observations taken before making a terminal decision, and 
¢, and 1 — ¢, are respectively the probabilities of making decisions d, and d, 
under the condition that N = n and the values X, , --- , X, have been observed. 
A test defined in this way will be denoted by {y, , ¢,}. The sequence {y,} will 
be referred to as the stopping rule and {¢,} as the terminal decision rule of the 
test. It will be assumed that N < «, that is Wo + yi + --- = 1, with probability 
one when the common probability density f of the independent random variables 
X,, X2, --- is any one of the functions fy , f; , fe. We note that the probability 
of making decision d, when f = f, equals 


Edén) = El Di vets). 


The probability density []}-,f.(z;) with respect to the product measure 
un” (n 2 1) will be written f,,, for short. It will be convenient to define 


Sin/fin = 1 ifn = 0, 





356 WASSILY HOEFFDING 


in accordance with the convention that an “empty” product is equal to 1. 
Similarly, the empty sum, fue with n = 0, is defined to be 0. The notation o. 
will serve to denote any terminal decision rule such that for n = 1, 2, --- 


if fin < Son 
(2.1) Sn > Iu 


The following lemmas will be needed. Lemmas 1, 3, 4, 5 and 6 will be used in 
the proof of inequality (1.3), Lemmas 1, 2, 4 and 7 in the proof of (1.4). Most 
of the lemmas are known. The simple proofs of all but Lemma 4 are included 
for convenience. 


Lemma 1. Jf {W, , on} is an arbitrary sequential test, 
(2.2) Ex(¢~) + Ex(1 — ov) = Ex(ox) + Ex(1 — oy), 


where the same stopping rule |,} is used on both sides of the inequality. 


Lemma 2. If fo(z) = 0 implies min [f,(x), fo(x)| = 0, then for any stopping 
rule \Wa}, 


(2.3) E,(¢x) + Ex(1 — ox) = Eolmin (fiw, fow)/fow). 
Lemma 3. For any stopping rule {y,}, 

(2.4) E,(¢x) + E2(1 — oy) 2 Eo{min (fow, fin, few) /fowl, 

where the sign of equality holds if 

(2.5) fow 2 min (fiw, fow) 


with probability one under f, or fe . 
To prove these three lemmas we note that for any test {y, , ¢.} 


E\(on~) + E21 — on) = Wo+ 2 | Valon Sin + (1 — oy )fe.n) dp” 
(2.6) 


z+ © [vw min (fan Sem) dy” 


with equality for ¢, = ¢.,n = 1,2, --- . This proves Lemma 1. 
If the condition of Lemma 2 is satisfied, we can write min (fin, fen) = 


[min (fi,n; fe.»)/fo.n/fo,. in the integrand in (2.6), which implies Lemma 2. 
Finally, using (2.6), 


E08) + Ex — 6%) 2 ot DS [ve min (fos Sis» Sem) du”. 


Upon dividing and multiplying by fo,, in the integrand we obtain inequality 
(2.4). The condition for equality in Lemma 3 is easy to verify. 

Lemna 4. If Ey(N) < ~ and t(z) is a real-valued function such that Eg{t( X;)| 
exists, then 


N 
(2.7) Ey [X (X;)] = Egt(X:)|Eo(N). 





SEQUENTIAL SAMPLE SIZE AND RISK 357 


Equation (2.7) is originally due to Wald [11] and has been proved under the 
present assumptions, except for the trivial extension to randomized tests, by 
Blackwell [3]; see also Wolfowitz [15]. 


Lema 5. If a; = 0,b; 2 0,c¢; 2 0,7 = 1, --- , n, then 
(2.8) min (II a;, I] b,T1«) 2 [I min (a; ,b; ,¢;). 
j=l j=l j=l j=l 


In fact, each of a; , b; , c; is 2 min (a; , b;, ¢;). 
Lemma 6./f0 Sd; S$ 1,j = 1, --+,m, then 


(2.9) 2d (1 -—d)21- IT 4;. 


The sign of equality is attained if and only if all but at most one of d,, «+: , ds 
are equal to 1. 


Lemma 6, with the condition for equality, follows from the identity 
n n n m—1 
> (1 —d;)-—1+ I] 4, ~ d ( — dn) (1 - I] 4). 
= i= mn ye 


Lemma 7. If U is a random variable, E(e") = e*‘” whenever the expectations 
exist. The sign of equality holds if and only if U is equal to a constant with prob- 
ability one. 

Proor. Let V = U — E(U). By Taylor’s formula, e” 2 1 + V, with equality 
only if V = 0. Hence E(e") = 1, and the lemma follows. 


3. Proof of inequality (1.3). By Lemma 1, for any test which satisfies 
E\(¢n) S a and E(1 — ow) S an, 


(3.1) a, + a = E,(ox) + El — on). 
By Lemma 3, 


(3.2) E,(¢x) + Ex(1 — ox) 2 Eolmin (fow , fiw , fow)/fow). 
By Lemma 5, 
(3.3) min (fon,fin,fen) 2 IT min [fo(x,;), fi(as), fo(2;)). 


Hence if we write r(z) = min [fo(z), fi(x), fo(z)\/fo(z), we have 


(3.4) E{min (fo.w ’ Siw » Se.w)/fow| = Edl] r(X;)}. 


Note that 0 S r(x) S 1. If we apply Lemma 6 and then Lemma 4, we obtain 


‘ N 


| TI ix) | = Eo}! -~Ddli- 1x,)) 
(35) . aoe ;' 


=|i— E\(N)E,f1 —r(X;,)] =-l1— Fatw)| ~ [ min (fo, Si, fs) an | 





358 WASSILY HOEFFDING 


Inequality (1.3) now follows from (3.1), (3.2), (3.4) and (3.5). 


4. Discussion of inequality (1.3). The sign of equality in (1.3) holds if and 
only if it holds in each of the inequalities (3.1), (3.2), (3.4) and (3.5). Equality 
in (3.1) is attained if ¢, = ¢. and 


(4.1) E\(¢x) =a,  Ex(1 —¢x) = a. 
By Lemma 6, equality in (3.5) holds if and only if, for each n 2 1,N =n 


implies that all but at most one of r(X,), --- , r(X,) are equal to 1, with prob- 
ability one under fy . This is the case if 


(4.2) So( Xs) = fi(Xs) = fel Xs) forjg =1,---,N—1. 
If this condition is satisfied, equality also holds in (3.4). If, in addition to (4.2), 
(4.3) fo(x) 2 min [fi(x), fo(x)), 


then condition (2.5) of Lemma 3 is satisfied and hence equality is attained in 
(3.2). Thus conditions (4.1), (4.2) and (4.3) are sufficient for the attainment 
of equality in (1.3). 

Condition (4.3) is satisfied for many common one-parameter families of dis- 
tributions when 6 is between 6, and 6. Under this condition, (1.3) reduces to 


E,(N) = (1 — a ~ a) / (1 _ J mim (ff) du) 


=(1-a-a)/ (5 fih-fids) 


(4.4) 


Condition (4.2) is satisfied, and equality holds in (4.4), in the following two 
cases. 

The first case is where the densities are arbitrary, subject only to (4.3), but 
the values a, and a, are such that they can be attained as error probabilities 
with a test which requires at most one observation (N S 1), and if an observa- 
tion zx is taken, decision d,(dz) is made if f;(x) — fo(x) is > 0 (< 0). 

The second case in which equality in (4.4) is attained is where, in addition 
to (4.3), the set C = {x|fo(x) = fi(z) = fe(z)} has a positive probability. 
Let Cy C C and let the complement of Cy be subdivided into two disjoint sets 
C, and C, such that fi(z) — fe(xz) 2 Oif re C, and Ss Oif re C,. Let N be the 
least n such that 2, ¢Cy. Decision d; is made if rye C;, i = 1, 2. (Instead, 
suitable randomized decisions can be made when z, ¢ C.) Then it can be readily 
verified that equality holds in (4.4), with A(N) = (1 — po), % = Pw 
(1 — po), a2 = pu(l — po), where po is the probability of Cy (under any f,) 
and p,; is the probability of C; under f;. In the particular case where yu is 
linear Lebesgue measure, f;(x) = g(x — 0;),g(z) = 1/L, —L/2 S x S L/2,9(z) 
= 0 otherwise, 0 < # — 6 < L,and 6 S % S &,wehaveC = (@ — L/2, 6 
+ L/2}. Let —L/2S$cSd 5 6+ L/2, Co = (c,d), C, = (—”, d, C2 = 
ld, + «). Then with the test just described, perhaps preceded by a random- 





SEQUENTIAL SAMPLE SIZE AND RISK 359 


ized decision as to whether to take at least one observation, any error prob- 
abilities can be attained (a, 2 0, a2 2 0, a + az S 1). Moreover, the maxi- 
mum with respect to all real @ of the expected sample size of this test when 
the density is g(x — 6) is attained when @ is between 6, and @. Hence the 
test minimizes the maximum expected sample size. It should be noted that the 
present test is a modified version of the SPR test as defined in Wald [12], p. 120. 
It differs from the latter only in this respect: If the probability ratio after n 
observations equals one of the two numbers A and B (in Wald’s notation; in 
our case A = B = 1), the stopping decision and the terminal decision may 
depend on the position of the sample point in the corresponding sets, instead of 
being randomized decisions. 

It is of interest to note that the bound in (1.3) is always positive whereas the 
bounds in (1.1) and (1.2) take on the trivial value 0 if the integrals in their 
denominators are equal to + «. However, in most of the more common cases 
the bounds (1.1) and (1.2) (as well as (1.4) ) are better than (1.3). For instance, 
if fo , fx , and fe are normal distributions with a common variance and respective 
means 0, —é and 4, the bound in (1.3) is of the order 8’, but those in (1.2) and 
(1.4) are proportional to 6” and hence better than (1.3) if 6 is small. 

There is an interesting similarity between Wald’s inequality (1.1) and in- 
equality (1.3) (or (4.4)) for fo = fi . If a; and a» denote the actual error proba- 
bilities, both inequalities are of the form 


(45) E,(N) = DGF) 


~ D(fi, fr)’ 
where D is the measure of discrepancy between two distributions which appears 
in the denominators of (1.1) and (4.4), and f; denotes the distribution on the 
two points d, , d, of the decision space such that the probability assigned to d, 
is the probability of making decision d; when f = f, ; more precisely, f? is the 
probability density with respect to a measure yu* such that u*(d,) = u*(d,) = 1 
and 1 — ft(d:) = fi(ds) = a, 1 — f2(de) = f2 (di) = ay. 

It will be seen in Section 8 that inequality (1.3) can be deduced from a lower 
bound for the average risk of a general sequential procedure. However, the 
direct proof given in Section 3 makes it easier to determine the conditions for 
equality. Inequalities which are better but more complicated than (1.3) are 
given in Section 8. 


5. Proof of inequality (1.4). We assume that the integrals ¢, , { and r? in (1.5) 
and (1.6) exist and that the conditions (1.7) and (1.8) are satisfied. Let, for 
{= 1, 2, 


(5.1) Zin = y (1g MES = 2) 


and 


Zn = Zin — Zon = EY, , 





360 WASSILY HOEFFDING 


where Y, is defined in (1.9). Then 
(fin/fo.n) on P Tied ee 
Hence, by Lemma 2, 


E\(¢x) + Ex(1 — ¢%) = Eo [min (fe fas) 


0.N : fow 


(5.3) 


>t Zi,nttiN .ZentfoN) + [—max(Z Z2,.n)~{N 
= Ele max(Z1,n+f1N,Zentf2 } => Eee 2 28.9 cs) 


where ¢ = max ({; , 2). By Lemma 7, 


(5 4) Ele mon(S:.9 -Se.0)—69) > e¢ Eo(max(Z,,w.Ze,w))—f Bo (N) 


Since 2 max (Zi »Zonw) = Zin t Zon + | Ziw — Zow | > Zin — Zon = Zn ’ 
and, by Lemma 4, £o(Z:.~) = Eo(Z2~) = 0, we have 


(5.5) E,{max (Z:.~ , Z2.~)] = 4 Eo(| Zw |). 
Also 


(5.6) Eo(| Zw |) S (Eo(Zs)|' = r[Eo(N)I, 


where we have used equation (1.8). 
Thus if (¥, , o,) is any test such that Ei(@w~) S a , Ex(1 — ow) S a, and 


equation (1.8) is satisfied, it follows from Lemma | and the relations (5.3), 
(5.4), (5.5) and (5.6) that 


log (a; + a2) 2 —(17/2)[Eo(N)]' — ¢Eo(N). 
Solving this inequality for Zo(N), we obtain (1.4). 


6. Discussion of inequality (1.4). Inequality (1.4) has been obtained by com- 
bining the four inequalities (3.1), (5.3), (5.4) and (5.6). Equality in (3.1) is 
always attainable for suitable a, and a, , and in (5.3) it holdsif f; = {2 (= £). 
In (5.4) the sign of equality holds if and only if max (Z;.y , Z2~) + ¢N is con- 
stant with probability one (see Lemma 7), and in (5.6) it holds if and only if 
| Zy |, that is | Zi:.~ — Ze~|, is constant with probability one, both proba- 
bilities evaluated under fy . The last two conditions cannot be satisfied simul- 
taneously except in trivial cases. 

To obtain an idea of how close the bound in (1.4) can come to the minimum 
attainable value of Eo(N), we shall consider the following special case. Let f; 
be the normal probability density with variance 1 and mean 6; , where @ = 0, 
6, = —5 and 6 = 6 > 0. Then ¢, = {& = 8°/2, r = 26, and inequality (1.4) 
becomes 


(6.1) E(N) = &*{{1 — 2 log (2a)}' — 1)°, 


where 2a = a + a. This bound will be compared with the values of Eo(N) 
for a fixed sample size test, Wald’s SPR test, and a test considered by Anderson, 
with error probabilities a, = a: = a( <4) in each case. 





SEQUENTIAL SAMPLE SIZE AND RISK 361 


Let S, = X, + --- + X,. For a fixed sample size test such that decision 
d, or d, is made according as S, < 0 or S, > 0, the error probabilities at 6 = --é 
and @ = 6 are both equal to #(—én'), where 


(zx) = (2x)? [ eo? dy. 


Hence Eo(N) is the least n such that @(—én') < a. If X = \(a) is defined by 
#(—r) = a, we have 


(6.2) EWN) = 8°’, 
exactly or with a good approximation. If a ~ 0, then’ — «© anda = #(—)) = 


(29) ne 1 + O(r*)). Hence * = —2 log a + O [log (—2 log a)}. The 
factor of 8” in inequality (6.1) is 


{{1 — 2 log (2a)}' — 1}? = —2 log a + O [(—2 log a)’. 


Thus if @ is small enough, the bound in (6.1) is nearly attained with a fixed 
sample size test, although the asymptotic approach is extremely slow. It follows 
that the fixed sample size test nearly minimizes the expected sample size at 
6 = 0 when a is (very) small. 

Now consider the SPR test which stops as soon as 26 | S, | > log A (>0). 
Then (log A)’S 48°Eo(Sy) = 48°Eo(N) by (1.8), and A S (1 — a)/a. These 
inequalities are close approximations for a fixed and 6 small enough (Wald [10)). 
With this approximation, 


2 
(63) Ey(N) = 6? (; log 1=#). 
a 


Put a = (1 — «)/2. Then 


a 


(1 log i= *) =€+4é+ Heé+--- 
and 
({1 — 2 log (2a)}' —1)? = 6 + 4° — fe + --- 


Thus if a is close to its upper bound }, and 64 is small enough, the lower bound 
in (6.1) is nearly attained with a SPR test. Hence the SPR test nearly mini- 
mizes Ey{N) in this case. Table 1 shows that even for a = 0.2 the expected 


TABLE 1 
Values of Eo(N) and of the lower bound in (6.1) for 6 = 0.1. 





| | 
a= 0.0001 . , 0.05 o1 6.2 


Fixed sample size 1383 . 270.6 | 164.3 70.8 | 27.5 
SPR test 2121 ‘ 527.9 216.7 | 120.7 48.0 17.9 
Anderson’s test _ 402.2 192.2 | — 

Lower bound (6.1) 1054 | 388.3 | 187.0 | 111.1 46.6 | 17.8 


} 





362 WASSILY HOEFFDING 


sample size exceeds the lower bound by only 3%. (The lower bound in (1.2) 
with c = 4 also approaches E(N ) for the SPR test as a — 4. However, inequality 
(6.1) is better than (1.2), as applied to the present case, for all values of a.) 

For a values not close to 0 or 4 we compare the bound in (6.1) with the ex- 
pected sample size of a test considered by Anderson [1]. This test stops as soon 
as | S, | 2 ¢ + dn, where d < 0 < c. Anderson approximated the sequence {S,} 
by a Wiener process so that his values for the expected stopping time, Eo(r), 
when the mean of the process is 0 are approximations to Eo(N). He chose the 
constants c and d so as to minimize Ey(7r) subject to prescribed error probabili- 
ties a,j = ag = a at 6 = +4, ford = 0.1 and a = 0.01 and 0.05. Anderson’s 
values are given in Table 1. The expected sample sizes exceed the lower bounds 
by only 3.6% and 2.8% , respectively. This shows that both Anderson’s test (as 
judged by the expected sample size at @ = 0) and inequality (6.1) cannot be 
greatly improved in these cases. 

To conclude this section, it will be shown that for each of the two sequential 
tests here considered the expected sample size attains its maximum when the 
mean @ of the normal distribution is 0. In conjunction with the preceding results 
this implies that each of these tests (as well as the fixed sample size test) comes 
close to minimizing the maximum expected sample size for certain a values. 

Both tests are such that sampling is stopped as soon as | S, | 2 c, , where 
Ci, C2, «** are nonnegative constants. The expected value of N at @ is the sum 
of the probabilities P(N > n | 6]. We can write 


PIN > nj 6] = [ #0 = ©) de, 


where y = (yi, *** , Yn), 2 = (1,1, +++ , 1), f is the probability density of n 
independent normal random variables with mean 0 and variance 1, and A = 
fy|ly+--: + ym| <em,m = 1,--- , n}. The set A is convex, andy eA 
implies —y ¢ A. It follows from a theorem of Anderson [2] that P[N > n | 6] 
attains its maximum at @ = 0 (and is monotone for 6 < 0 and @ > 0). Thus 
the same conclusion is true for the expected value of N. 


7. Lower bounds for the average risk. In this section a sequence of increasingly 
better lower bounds for the average risk of a general sequential procedure will 
be derived. Under certain conditions these bounds converge to the minimum 
average risk. They are similar to the bounds given by Blackwell and Girshick 
[4] and will be obtained as a consequence of results of Wald and Wolfowitz [13] 
which are also contained in Wald’s book [12]. In slight extension of the assump- 
tions in [13] and [4], the cost per observation will be allowed to depend on the 
parameter; due to this assumption the bounds can be used to obtain lower 
bounds for the expected sample size (see Section 8). 

The random variables X,, X:, --- are assumed to be independent with a 
common probability density fy with respect to a o-finite measure yu, where the 
parameter @ is contained in a space 2. To simplify the exposition, the assumptions 





SEQUENTIAL SAMPLE SIZE AND RISK 363 


of [13], Section 2, will be made (with some obvious changes in notation), with 
two exceptions stated below. In particular, u is Lebesgue or counting measure 
on the real Borel sets (this is not essential); the loss function W on 2 X D is 
nonnegative and bounded; the terminal decision space D is compact in the sense 
of the convergence sup, | W(6, d;) — W(@, do) | —+ 0; the a priori distributions 
are the probability measures on a fixed Borel field of subsets of 2. The cost of 
m observations is assumed to be c(@)m, where c(@) is nonnegative, bounded and 
measurable on the given Borel field of subsets of 2. (In [13], e(@) is a constant. ) 
In addition, we assume that the function infs.ofs is Borel measurable. The class 
A consists of all sequential decision functions 4 which satisfy the needed measur- 
ability conditions as specified in [13]. For the other measurability assumptions 
we also refer to [13]. 

Denote by r( 6, 6) the risk (expected loss plus expected cost) when the decision 
function 6 is used and the parameter is 6. For any a priori distribution — over 2 
let r(£,6) = f r(0, 8) dt. Let p(£) denote the infimum of the average risk r(£, 6) 
for 6 e A. Let 


e(t) = [ c(0) de, pol) = infuen [ W(,d) dt, ely) = f Solu) de, 


and let & denote the distribution over 2 defined by dt, = fe(y) dt/f;(y). Then 
the function p(£) satisfies the equation 


(7.1) p(t) = min | Ce), f o(é dC) dat) + ate) |. 


This is a straightforward extension of Theorem 3.2 of [13]. 

For n = 0 let p,(£) denote the infimum of r(é, 6) for 6 ¢ A, , the class of all 
decision functions in A which terminate after at most n observations. (This is 
consistent with the definition of po(£) above.) By direct extension of Theorem 
3.1 of [13] we have 


(7.2) p,(&) = min | Ce), J oea(Ee)fe(v) day) + ae], n=1,2,--- 


Clearly po(t) 2 pi(t) 2 p(t) S --- S pl). In [13] it is shown that if ¢(@) = 
c > 0, then lim p,(£) = p(é). 

Blackwell and Girschick ({4], pp. 255-256) have given lower bounds for p(£) 
which with the present cost function can be defined as follows. Let re(t) = O 
and define recursively for n = 1, 2, --- 


(7.3) ra (é) = min Eo / rei (Ey )fely) duly) + ate) |. 


Then r3(¢) Sri (ft) s re (—) S --+ S p(€), and if c(@) = c > O, then lim 
r=(t) = p(£) [4]. 

It will now be shown that the lower bounds (7.3) can be improved with the 
help of an inequality of Wald and Wolfowitz [13]. Sufficient conditions for the 





364 WASSILY HOEFFDING 


convergence of these lower bounds and of the upper bounds p,(£) to p(t) when 
c(@) is not constant will also be given. 
Let 


(7.4) henge [ intsea fol) duly). 


Excluding the trivial case where all distributions fy are identical, we have 0 < 
\ S 1. Now define 
(7.5) po(£) = min [po(£), X~*e(E)] 


and recursively for n = 1, 2, --- 


(7.6) px(é) = min [oes fon -a(eitv) du(y) + ate) |. 


We shall write fy, for IV fo(x;), fen for f fo, dé and &” for the a posteriori 
distribution over @ after n observations x; , --- , z, , 80 that dt" = fy. dt/fen . 
TuHeoreM 1. We have 


(7.7) po(E) S pil) S p(t) S --- 
In order that 


(7.8) lim pa(é) = lim p,(£) = 


nes nee 


it is sufficient that either 


(7.9) lim / pole” fe, du’ = 0 


noe 


or 
(7.10) E{c(@) > O} = 1. 


ReMARK 1. If A = 1, then pa—1(€) = re(é), so that the two sequences of 
bounds are equivalent. We always have pa-1(€) > re(é). 

RemaArRK 2. The integral in (7.9) is the risk of the (fixed sample size) Bayes 
procedure based on n observations when c(@) = 0. Thus condition (7.9) is 
satisfied for all ¢ if the maximum expected loss of some decision rule based on n 
observations tends to 0 as n — «. An upper bound for the integral in (7.9) 
(which, in turn, is an upper bound for p,(£) — p»(£)) for the case of finite Q is 
given in Theorem 2 below. 

RemaRK 3. In Section 8 it will be shown that the inequality p(f) 2 po(£) 
implies inequality (1.3). The discussion in Section 4 shows that equality in 
p(t) = po(£) is attained in special cases. 

Proor or THEOREM 1. Since p(¢) = inf; f r(@, 6) dé and 


| oe dietw) du(y) = fints| f r(8, Bf) 48(0) | dy), 





SEQUENTIAL SAMPLE SIZE AND RISK 


we have 


(7.11) | oC dfely) dav) = p(é) [ ints fo(y) duty) = (1 — A)p(é). 


(This is essentially equivalent to inequality (3.22) of [13].) Hence, by (7.1), if 
p(t) < po(£), then p(£) = d ‘c(£). Therefore p(£) 2 po(€). It now follows from 
(7.1) and (7.6) by induction that p(£) 2 pn(é) for alln 2 0. 

To complete the proof of (7.7) we now show that 


(7.12) pa(t) = pa—a(€), 


It can be seen in a similar way as in the proof of (7.11) that 


[ osce fey) duly) = (1 = rdpvie). 
Hence, by (7.6) with n = 1, 


pi(€) = min [po(E), (1 — A)po(E) + e(€)). 


It is readily shown that the right side of this inequality is equal to po(t). Thus 
(7.12) is proved forn = 1. Forn = 2,3, --~- the result follows by induction from 
(7.6). 

To prove the remaining part of the theorem, we first observe that p.(£) (just 
as r2(£); see [4]) can be interpreted as the minimum average risk in a modified 
decision problem. Let D’ denote the original terminal decision space D, aug- 
mented by a terminal decision dy ¢ D. Let the loss function be W(@, d) if d # dy , 
but A~'c(@) if d = dy. The cost function is that of the original problem. Let A, 
denote the class of all sequential decision functions (subject to measurability 
assumptions analogous to those in [13]) which terminate after at most n (2 0) 
observations, such that decision do is «llowed only after the nth observation 
has been taken. If r’(@, 6) denotes the risk function in the modified problem, it 
can be seen that the minimum of r’(£, 6) for 6 in A. is equal to pa(é) as defined 
by (7.5) and (7.6). 

Since pa(€) S p(t) S p.(), (7.8) will be proved if we show that 

lim [pn(E) — pa(€)] = 0. 

noe 
For a fixed a priori distribution &£, let é, bea Bayes decision function in A., 
so that p.(t) = r’( g, 5..). Let 6, be the decision function in A, which is identical 
with 4, before the nth observation is taken and makes the optimal terminal 
decision after the nth observation. Denote by Vn = v(x , +++ , In_4) the 
probability that the sample size N’ required by procedure 5, is equal to n, given 
that the first n — 1 observations are 2; , --- , Za. . Then 


pn(E) — pal€) S r(t,6,) — r’(E,8,) = [ vitonte™) — polt” fen dp” 





366 WASSILY HOEFFDING 


Therefore 


(7.14) prlé) — palt) S [ vi ol When du”. 


It follows immediately that condition (7.9) is sufficient for (7.13) and hence 
for (7.8). Also, if W is an upper bound for W(6, d) and hence for po(£), (7.14) 
implies 


(7.15) a(t) — oul) s Wf vi Sen du” = WPAN’ =n). 
Now 


P= put) = (E84) = | c(0)EA(N’) dt 


> [ c(oymP,(n’ =n) dz / P(N’ = n) dé 


{e(@)2n-4} 


= ni | Paw =n) — | P,(N’' = n) a] 


je(@)<n~4} 
= n'[P.(N’ =n) — t{c(@) < n'}). 
Thus 
(7.16) P(N’ =n) Ss n'°W + te(0) < nj. 


Letting n — ~~, it follows from (7.15) and (7.16) that condition (7.10) is 
sufficient for (7.8). This completes the proof of the theorem. 

The section is concluded by a theorem which shows that if © is finite, then 
under a natural assumption on the loss function the difference p,(£) — p(t) 
converges to 0 uniformly in £ at an exponential rate. 

THEOREM 2. Jf 2 consist of k points, @ = 1, 2, --- , k, say, and if for each 6¢2 
there is a dy ¢€ D such that W(6, dy) = 0, then 


(7.17) p(t) — pn(t) S W(k — 1)y’, 
where W is an upper bound for W(@, d) and 
Y = max / (fi f;)* du. 
1m) 


We remark that y < 1 if it is understood that no two of the functions f, , --- , 
fx are densities of the same distribution. The theorem exhibits a particularly 
simple bound for p,() — pa(é) ; closer bounds are contained in the proof. 

To prove the theorem we note that by (7.14) 


(7.18) alt) — u(t) Sf ould fen da”. 





SEQUENTIAL SAMPLE SIZE AND RISK 
Let £ assign probability g; to the point @ = i. Then 
& 
[ ool fem du® = f inte DoW, Ain du” 


k k k 
Ss / = 2d 9: Wi, ds)fin du” S / 2 ; 2 9: Wa, dy )fin du”, 


where ¢, , --: , ¢ are arbitrary nonnegative measurable functions of x , «~~ , 2. 
such that ¢@ + --- + ¢@ = 1. Hence, recalling that W(¢, d;) = 0, 


| onle fen du" s W / SE oso: Sun du" = W yo | (1 — oi)fin dy”. 
ini 


Let, in particular, ¢; = Ive di; , where ¢ = 1 and fori < j, 1 — om = 
i; = 1 if fin > fi, and = O otherwise. The conditions ¢ 2 0 and ¢ + --- 
+ ¢ = | are satisfied. (Note that if¢; = 1, then f;,, = max,f,...) By Lemma6 
of Section 2, 1 — ¢ S > 5.1 (1 — ¢:;), where the term with j = i is zero. Hence 


[= e0fen du s > [= bu)fen de". 
jel 
Now if i # j, 


| Ms ii )Sin dy" <s [ min (Sims Sin) dy” 


Ss / (fin Sin) dy” = lf (f:f))" as | "s 7’. 


Hence f po(é” )fen du" < Wi gk — 1)y" = W(k — 1)y’, and the theorem 
follows from (7.18). 


8. Further lower bounds for the expected sample size. The lower bounds for 
p(&) in Section 7 can be used to obtain lower bounds for the expected sample 
size at a specified parameter point 6 in terms of upper bounds on the expected 
loss or (by choosing a suitable loss function) in terms of upper or lower bounds 
on the probabilities of various decisions at selected parameter points. For this 
purpose one chooses the cost function so that c(@) = 0 for 6 # 6 and c(@&) > 0. 
The explicit result will be stated only for a two-decision problem; extensions to 
problems involving more than two decisions will be obvious. 

Let & consist of the three points 0, 1, 2, and let there be two decisions d, and 
d,. Put W(1,d,) = W(2,d,;) = 1,W(i, d;) = 0 otherwise, c(0) = 1,c(1) = 
c(2) = 0. Let £ assign probability g,; to the point i (i = 0, 1,2). Then po(t) = 
min (gs , 92), e(£) _ go , and, with 6 = {Wn ’ on}, 


r(&,5) = goHo(N) + gikilow) + gole(1 — ow). 
For any n = 0, r(£, 8) = pa(€). Hence if E,(¢~) S a and E,(1 — ow) S as, 
(8.1) E(N) 2 ar ((on(&) — gros — grar)/go), n = 0, 1, 2, --- 





368 WASSILY HOEFFDING 


This gives a sequence of increasingly better lower bounds for Eo(N). In par- 
ticular, po(¢) = min (g: 92, Go), whereA = 1 — f min (fo, f; , fe) du. The ratio 
in (8.1) with n = O is maximized by letting g, = g: = ‘go , and the resulting 
inequality is equivalent to (1.3). 


REFERENCES 


[1] T. W. Anperson, “‘A modification of sequential analysis to reduce the sample size,”’ 
Ann. Math. Stat., Vol. 31 (1960), pp. 165-197. 
(2] T. W. ANperson, ‘‘The integral of a symmetric unimodel function,’’ Proc. Amer. Math. 
Soc., Vol. 6 (1955), pp. 170-176. 
(3) Davin Biacxwe tt, ‘On an equation of Wald,’’ Ann. Math. Stat., Vol. 17 (1946), pp. 
84-87. 
[4] Davin BLacKWELL AND M. A. Grirsuickx, Theory of Games and Statistical Decisions, 
John Wiley and Sons, New York, 1954. 
(5) T. G. Donneuty, “A family of sequential tests,’’ Ph.D. dissertation, University of 
North Carolina, 1957. 
[6] Wasstty Hoerrprna, ‘A lower bound for the average sample number of a sequential 
test,’’ Ann. Math. Stat., Vol. 24 (1953), pp. 127-130. 
(7) J. Krerpr anp Lionex Weiss, ‘‘Some properties of generalized sequential probability 
ratio tests,”’ Ann. Math. Stat., Vol. 28 (1957), pp. 57-74. 
[8] J. Serrz anp K. Winxeipaver, “Remark concerning a paper of Kolmogorov and 
Prohorov,’’ Czechoslovak Math. J., Vol. 3 (78) (1953), pp. 89-91. (Russian with 
English summary.) 
(9] Apranam Wa tp, “Differentiation under the expectation sign in the fundamental 
identity of sequential analysis,’’ Ann. Math. Stat. Vol. 17 (1946), 493-497. 
{10] Apranam Wa tp, Sequential Analysis, John Wiley and Sons, New York, 1947. 
{11] Apranam Wa p, ‘Sequential tests of statistical hypotheses,’’ Ann. Math. Stat., Vol. 
16 (1945), 117-186. 
(12) Apranam Wa p, Statistical Decision Functions, John Wiley and Sons, New York 1950. 
{13] A. Watp anno J. Wo.rowr7z, ‘‘Bayes solutions of sequential decision problems,’’ Ann 
Math. Stat., Vol. 21 (1950), pp. 82-99. 
{14] A. Wap anno J. Wo.rowrrz, ‘Optimum character of the sequential probability ratio 
test,”’ Ann. Math. Stat., Vol. 19 (1948), pp. 326-339. 
{15] J. Wotrowrrtz, “The efficiency of sequential estimates and Wald’s equation for se- 
quential processes’’, Ann. Math. Stat., Vol. 18 (1947), pp. 215-230. 





SAMPLING INSPECTION AS A MINIMUM LOSS PROBLEM 


By B. L. vAN DER WAERDEN 
Ziirich University 

Introduction. In March 1958, in a lecture at Berkeley, Milton Friedman 
pointed out that statisticians, when asked to recommend a sampling inspection 
plan for the producer or buyer of a mass product, usually ask him questions 
which he cannot answer. They first ask: What percentage of defectives would 
you allow without rejecting the product? If the answer is po, the statistician 
would choose a smaller percentage p, and a larger one p, , and ask questions 
like: Would you allow a probability of 5 per cent of rejecting the product, if 
the true percentage is p,? The answers usually are mere guesses. 

However, a competent manager could answer questions like: What is the cost 
of inspecting a sample of n? What would be your profit or loss if you buy or 
sell a lot with defective fraction p? What do you do if you reject the product, 
and what would be your loss in this case? Would it be very expensive to improve 
the quality of your product? A reasonable production and inspection plan ought 
to be based solely on these loss functions. 

In what follows, we shall leave aside production problems. We shall assume 
a plant to produce a product of variable quality, the variations being due to 
accidents we cannot prevent. The only thing the producer can do is to inspect 
a sample and, if it contains too many defectives, to examine the whole lot and 
to eliminate the defectives. And the only thing the buyer can do is to inspect 
a sample and, if it contains too many defectives, to return the product to the 
producer. 

The loss functions will be assumed to be linear functions of the defective 
fraction p, and the inspection cost to be proportional to the size n of the sample. 
We shall assume that the same inspection plan is used every day, or ia the 
buyer’s case every time he buys a lot, so that in the long run only the average 
loss counts. 

In Sections 1-3 and in Section 4, the minimum loss problem will be discussed 
from the producer’s and from the buyer’s point of view separately. In Section 5, 
the producer’s and the buyer’s point of view will be combined. It will be shown 
that the two partners may increase their joint profit by forming a coalition and 
combining their inspection plans into one. 

After having finished an earlier draft of this paper, I karned that 8. Moriguti 
[1] and S. Ura [2] investigated the problem of minimax inspection plans from 
exactly the same point of view. Moriguti’s results are just the same as mine 
obtained in Section 2 for Case A (pon and qm large). Ura’s results are close to 
mine obtained in Section 3 for Case B (n large, pon not large), but there are 
slight differences in the numerical values. On the other hand, Ura treated the 


Received June 6, 1958; revised February 19, 1960. 
369 





370 B. L. VAN DER WAERDEN 


cases k = 1, 2, 3, 4 and 5, whereas my calculations stop at k = 2. Since my 
assumptions are a little more general than those of Moriguti and Ura and since 
their papers are not easily accessible, I shall expose the theory anew. 

It would be interesting to extend the theory to sequential tests. A guess con- 
cerning the best sequential test will be formulated at the end of Section 4. 


1. The producer’s problem. 

A. The loss function. Suppose a producer takes a sample of his product every 
day and applies a test. If the lot is accepted, it is sold at a fixed price, but de- 
fective units may be sent back by the buyer. The resulting loss is proportional 
to the defective fraction p, so the loss will be, in the case of acceptance, 


(1) L’ = ap. 


If the product is rejected, the whole lot will be examined and the defectives will 
not be sold. Let the cost of inspection of the whole lot be c, and the loss result- 
ing from not selling the defectives bp. Hence, the loss in the case of rejection is 


(2) L’ = te. 


On the other hand, if rejected lots are discarded (which is, of course, always 
the case when inspection is destructive), we have to replace (2) by L” = ec, 
that is, we have to put b = 0 in (2). 

To these losses, we have to add the cost of inspection of the sample. For the 
sake of simplicity, we shall suppose that the test is not sequential, so that the 
cost of inspection is simply fn, where n is the size of the sample. Thus, in the 


case of acceptance, the loss becomes L’ + fn. In the case of rejection and 100% 
inspection, we have already inspected a sample of n at cost fn and we still have 
to inspect the rest of the lot at cost c — fn, so the term fn cancels out and the 
total loss is simply L”. 

For any p, let P and Q be the probabilities of acceptance and rejection by a 
given test. The expectation value of the loss is 


(3) L = P(L’ + fn) + QL’. 


This loss function has been considered by Weibull [3] and others. However, as 
Hamaker [4] has rightly remarked, the sample size is usually small as compared 
with the size of the lot. Therefore we may, for all practical purposes, replace 
L” in (3) by L” + fn, thus obtaining the simpler formula 


(4) L = PL’ + QL” + fn = L’” + fn, 


which was also adopted by Anscombe [5] and others. 

If rejected lots are discarded, the same formula (4) holds, only we have to 
put b = 0 in (2). The formula thus obtained also holds in the case of destructive 
testing. 

For p = 0, L” is larger than L’. For p = 1, we may suppose that L’ is larger 
than L”, for it is usually better not to sell a totally defective product than to 





SAMPLING INSPECTION AND MINIMUM LOSS 


Fie. 1 


sell it and to have to take it back with a complaint fiom the buyer. So the two 
straight lines (1) and (2) in the (p, L)-plane intersect at a point S with coordi- 
nates (po ’ Io), 


(5) Po = c/(a — b). 


We shall always put p + g = 1 and po + @ = 1. 

If we knew p, the best procedure would be to accept the product for p < po 
and to reject it for p > po. The loss would then be L,, = Min (L’, L”), or 
La = ap forp S po, 
La = bp+e forp > po. 
The function L,, is represented by a heavy line in Fig. 1. We may call L,, the 
unavoidable loss. 

The sum L’” = PL’ + QL” occurring in (4) is represented, in our diagram, 
by a curve passing through the point S. For p = 0, we may assume that ac- 
ceptance is certain, and, for p = 1, that rejection is certain. So the curve L’” 
passes through the origin O and the end point B of the line L”. 

B. The minimaz loss solution. If we try to minimize the maximum loss, we 
only find the following trivial solution: Reject without taking a sample. For if we 
follow this procedure, the loss is L” and the maximum loss is b + ¢ (point B in 
the diagram), whereas for all other procedures the maximum loss is larger. 

This extremely cautious procedure may be quite reasonable, e.g. in cases 
where the producer knows that some accident happened in the production 
process and decides to inspect the whole lot without first taking a sample. In 
most cases, however, we do not expect p to be large. In those cases, the pessi- 
mistic minimax loss procedure is not justified. 

C. Introduction of an a priori distribution. If we endeavor to formulate our 


(6) 





372 B. L. VAN DER WAERDEN 


feeling that p is not likely to be large as an exact mathematical hypothesis, we 
may introduce a process distribution function, F(p), which becomes nearly 1 
for unlikely large values of p. We now have to minimize the expectation of the 
loss, 


(7) E(L) = [ Lar) 


The exact minimum of E(L) can be determined only if the distribution func- 
tion F(p) is known. The derivative F’(p) is usually called the process character- 
istic. Various functions F(p) or F’(p) have been proposed by several authors. 
Hamaker [6] gives pictures of five of these functions, and Horsnell [7] lists eight 
proposals. However, in the discussion to Horsnell’s paper, Barnard observes: 

“We at Imperial College did some work in trying to find out what sort of 
process curves do in fact turn up in industry and none we have seen bears the 
slightest resemblance to those tabulated in Table I’’. 

To this information, Hamaker [6] adds: “Apart from this, industry is con- 
stantly changing its products and processes, and by the time we have collected 
a sufficient number of data for a more detailed analysis some changes may be 
introduced which completely alter the situation.” 

Therefore, it seems that the various theoretical solutions of the minimum 
problem of the loss expectation (7) are of little practical value. We have to 
admit that we know next to nothing about the actual process function F(p), 
and we must try to find a practical solution depending only on the loss con- 
stants a, b, c, f. 

D. The minimum regret problem. The loss L may be split into two parts 


(8) L=Lat R. 


The first term L,, is the unavoidable loss. The second term, the excess of the 
loss over the unavoidable loss, is called the regret. Substituting for L and L,, 
the expressions (4) and (6), we obtain 


R = (a — b)(p — po)P + fn forp > po, 
R = (a — b)(po — p)Q + fn for p S po, 


(9) 


(10) E(L) = | Ln aF(p) + | R aP(p). 


The first term on the right is independent of the test procedure. Therefore, in 
order to minimize E(L), we have to minimize the second term, the expectation 
of the regret, 


(11) E(R) = [ RaF(p). 
For any given test, let M be the maximum of the regret R = R(p). The 


formula (11) implies, of course, that E(R) s M. Hence, if we make M small, 
we are sure that Z(R) is always small, no matter what F(p) is. Thus, we are 





SAMPLING INSPECTION AND MINIMUM LOSS 373 


led to the following minimax problem: To find a test making the maximum of the 
regret (9) as small as possible. 

Let m be the minimum of the maximum regret, M, for all possible tests. It is 
clear that the minimum exists. Let 7; be a test for which the maximum regret 
is just m. The inequality (11) implies for the test 7, that E(R) S m for all 
F(p). 

On the other hand, let 7 be a test for which the maximum regret M is larger 
than m, and let p,; be a value for which R(p,) is just M. We now may define a 
function F(p) which jumps from 0 to 1 for p = p, . For this function F(p) the 
regret expectation is F(R) = M > m. 

Hence, if we want to have a test for which the regret expectation is always 
<S m, no matter what F(p) is, our only possibility is to take a minimax regret 
test 7, . All other tests yield, for some F(p) a larger regret expectation, F(R), 
and hence a larger loss expectation E(L). 

Empirically I have found that the minimax regret tests 7 are also minimum 
E(L) tests for some suitable process function F(p). The process functions I 
used were quite reasonable. I assumed, as other authors did, that p may take 
two values p; < po and p, > po with certain probabilities. Processes of this 
type, which usually produce a satisfactory product but sometimes a bad one, 
do occur. Thus, we see that the minimax regret tests are quite good also from 
the minimum E(L) point of view under reasonable assumptions concerning the 
process function F(p). 

The minimax problem will be solved in two cases, (A) and (B). In case (A), 
pon and gon are both large. In case (B), the sample size n is still assumed to be 
large, but np» is not. Case (A) was investigated by 8. Moriguti [1], case (B) 
by S. Ura [2]. For sequential tests see Section 4 C. 


2. The minimum regret solution in case (A). Let pon and gm both be large. 
Let h be the fraction of defectives found in the sample. Let hy be a critical frac- 
tion near po , so that the test 


accepts, if h < ho, 
rejects, if h> he. 


We are interested only in p-values near ho and hence near pp ; for, if p is much 
larger or much smaller than ho, rejection or acceptance is nearly certain. For 
those p-values, the variance of h, 


(12) o = n pq, 


may be approximated by the constant 
(13) oo = nN Pogo. 


Let © be the normal distribution function with mean zero and unit variance. 
The probability of acceptance may be approximated by 


(14) P = 0((ho — p)/oo). 





374 B. L. VAN DER WAERDEN 


The regret now becomes 

(15) Ry = (a — b)(p — po)®((ho — p)/oo) + fn 

(16) R_ = (a — b)(po — p)®((p — ho)/ao) + fn 
Putting 

(17) P — Po 

(18) ho — Po 

we may write, for positive x = z, 

(19) R, = R4(z) = (a — b)ooeb(s — z) + fn, 

and, for negative r = —z, 

(20) R_ = R_(z) = (a — b)ogeb(—s — z) + fn. 


If s is positive, the function R,(z) is always larger than R_(z), hence the 
maximum of R, is larger than the maximum of R_. Between the two maxima 
lies the maximum of 


(21) Ro(z) = (a — b)oorb(—z) + fn. 


Hence, for s > 0, the overall maximum of the regret R is the maximum of R, , 
and it is larger than the maximum of Ry. For s = 0 the regret is Ry . For s < 0, 
the overall maximum is the maximum of R_, and it is larger than the maximum 


of R,. Hence, in order to minimize the maximum regret, we have to assume 
s = 0, or ho = po. The test now 


accepts, if h < po, 
rejects, if h> po, 
and the regret is, for z = +2 as well as forz = —z, 
(22) Ro(z) = geb(—z) + fn, 
with 
(23) g = (a — b)a. 


The function 26(—z) is zero for z = 0 and again for z = «. The maximum 
of the function is 


(24) C = .170 (for z = .752). 
Hence the maximum regret is 

(25) Rinx = Cg + fn. 
Substituting g and o from (23) and (13), we obtain 
(26) Rmax = jn” + fn 





SAMPLING INSPECTION AND MINIMUM LOSS 
with 
(27) j = Cla — b)( pop)”. 
We now determine n by minimizing Rus . This gives for n the approximation 


(28) n’ = (3C)""(a — b)** (pogo) = 193 ((a — b)/f)™"( pogo)". 


To this approximate value, we have to take the nearest integer n and to make 
sure that npo and nq are really large. 

The critical number k of defectives in the sample, i.e. the number which just 
leads to rejection is the next larger integer to 


(29) w= NPo. 


If po is near 4, and w + 4 near an integer, the approximations used here are 
good, even if n is not very large. However, in many cases p> is much less than }. 
In these cases, it is no longer admissible to replace o, as given by (12), by oo, 
as given by (13). For p > po the true o will be larger than op , and for p < po 
less. Hence R,(z) will become larger and R_(z) less. To minimize the maximum 
regret, we have to assume a negative s, which means that the critical value A 
becomes slightly less than po, and the critical number of defectives less than 
w+ 3. 

It would be interesting to replace these qualitative considerations by more 
accurate evaluations, to derive correction terms to the formulas for Rms. and 
n’, and to find a more accurate asymptotic formula for the critical integer k. 


For po < 1 this has been done by Ura [2]. His asymptotic formula (19) may be 
written, in our notations, ask = w + .145. 


3. The minimum regret solution in case (B). 

A. The Poisson approximation. If npo is less than 4, the approximation used 
in case (A) may no longer be good. Assuming n to be large, we may use the 
Poisson approximation to the binomial distribution. Putting u = np, we obtain 
for the probability of finding just y defectives in a sample of n 


(30) P, = € “(u"/y!). 

If k is the critical value, which just leads to rejection, we have 
P= Pot--- + Pru, 
Q=1-—P. 


(31) 


The regret is 
R, = (a — b)(p — po)P + fn for p > po, 
R_ = (a — b)(po — p)Q + fn forp S po. 


To get rid of inessential constants we introduce new variables v, w, S instead 
of p, n, R by putting 


(32) 





376 B. L. VAN DER WAERDEN 


(33) p/po = 2, pon = w, (peo/f)R = S. 
The Poisson constant is now u = vw, and the new regret function S is 
S,=tv-1)P+w for v 
S_=t(1 —»)Q+w for v 


(34) 


with 


(35) t = ((a — b)/f)po = (c/f)po. 


We are now left with only one independent variable t. The procedure for 
determining v and w and the critical integer k is as follows. We first determine v 
by maximizing S, or S_, whichever has the largest maximum. The resulting 
maximum regret M = Smex is, for every choice of k, a function of w. We next 
determine w so as to minimize M. The resulting minimum m depends only on 
k. We finally determine k so as to minimize m. The corresponding valueof w = pon 
determines the size n of the sample. 

For small t, the best choice of k will be k = 1. This means that as soon as one 
defective is found, the product is rejected. We shall investigate this case in 
greater detail. 


B. The case k = 1. For k = 1, formulae (31) simplify to 
(36) P= 
Q = 


The regrets are 


(37) S,=te—- le” + w (v > 1) 
< 


(38) S.=tl1—v)(l-—e™)+w. (v S 1) 

The maximum of S, is found by differentiating with respect to v. The result is . 
(39) v=l+w. 
Substituting into (37), we obtain the maximum of S, , 
(40) M, =tw'e**” + w. 
We shall write this result as 


(41) M, = tf(w) + w, 


where f(w) is a known function of w. By the same method, we may determine 
the maximum of S_ as 


(42) M_ = tg(w) + wv, 


where g(w) is a known function, which may be computed numerically for every 
value of w. 


Plotting WM. and M_ as functions of w, we get, for different values of ¢, graphs 





SAMPLING INSPECTION AND MINIMUM LOSS 


M, 


Fic. 2 Fic. 3 


like those of Fig. 2 or Fig. 3. In every case, we have to consider 
M(w) = Max (M,, M_) as a function of w and to find the minimum of this 
function M(w). 

The function M, decreases from © to a minimum m, and then increases to 
«. The function M_ is always increasing. The intersection of the curves for 
M. and M_ is given by the equation 


(43) f(w) = g(w). 


Since f is a decreasing and g an increasing function, there is only one solution of 
(43), viz. w, = .868. This w, does not depend on t. We now may distinguish 2 
cases: 

Case of Fig. 2. If the function M, is increasing at w,, the minimum m of 


M(w) is equal to the minimum m, of M,. 

Case of Fig. 3. If M. is not increasing at w, , the minimum m of M(w) is the 
common value of M, and M_ at wm, . 

The condition for M, to be increasing at w, is 


(44) tf’(w:) +1 > 0. 


Now f’(w,) is negative, so condition (44) is satisfied for small t, but not for 
large t. The limit between the cases of Fig. 2 and Fig. 3 is 4, = 2.61. 

For ¢ S ¢, the minimum regret m is the minimum m, of M,. Equating the 
derivative of M. to zero, we find 


(45) t= (w+ 1) ‘we"™, 
(46) m, = 2w — (w +1)”. 


By (45), we may compute ¢ as a function of w and plot the result in the 
(t, w)-plane. Only the part of the curve between w = 0,¢ = 0 and w, = 868, 
i; = 2.61 is needed. For ¢ 2 2.61, the optimum value of w remains constant = 
w, , 80 the next part of the graph for w is a horizontal line w = w, (see Fig. 4). 
This line has to be extended to the right until ¢ becomes so large that k = 2 
would be more favorable than k = 1. When does this happen? 

C. The case k = 2. The method for finding the optimum value of w fork = 2 





378 B. L. VAN DER WAERDEN 


is the same as for k = 1, the only difference being that the case of Fig. 2 does 
not occur any more. If we compute the minimum m, of the maximum M, of 
the function 


(47) S, = t(v — 1)(1 + pw)e™ + w, 


we find that this minimum is larger than m, as given by (44) for all values of 
t below 13. Now a larger minimum means that the case k = 2 is less favorable 
than k = 1. So if we want to have k = 2, we must assume ¢ 2 13. Fort = 13 
we are already in the case of Fig. 3, and this holds still more for larger values 
of t. So we have to compute m from the equation m = M, = M_. 

Once more, M, and M_ are given by formulas like (41) and (42). The fune- 
tions f(w) and g(w) are more complicated now, but still f(w) is decreasing and 
g(w) increasing, so that the equation (43) has only one solution, viz. w. = 1.864. 

D. Comparison of the results fork = 1 and 2, The minimum m corresponding 
to w, and k = 1 was, according to (41) or (42), 


(47) m = .1779t + .868, 
and the minimum corresponding to w; and k = 2 is given by a similar formula, 
(48) m = .1227t + 1.864. 


The linear expressions (47) and (48) become equal for , = 18.06. 

For t < t,, the m of (47) is less, which means that k = 1 is better. Fort > ¢ 
the m of (48) is less, which means that k = 2 is better. At the point & = 18.06 
the function w jumps from the constant value w, = .868 to the constant value 
uw, = 1.864, and k jumps from | to 2. The value 1.864 remains until k = 3 be- 
comes better, etc. The behavior of w as a function of ¢ is shown in Fig. 4 (loga- 
rithmic scales). 

E. The asymptotic formula for large t. If t is very large, w is also large and the 
Poisson distribution may be approximated by a normal distribution. The asymp- 
totic formula for w may be obtained from (28) by putting go = 1 and multi- 
plying both sides by po. This gives us 


(49) w= npo = .193t, 
or the dotted line in Fig. 4. 


t,=2.60 °  t,=18.06 


Fia. 4. The (t, w)-diagram 





SAMPLING INSPECTION AND MINIMUM LOSS 379 


At the end of Section 2, we have seen that, for large w, the critical number 
of defectives k in a sample of n ought to be chosen slightly less than w + }. 
We now see that this is not only true for large values of k and w, but even for 
the smallest possible values k = 1 and k = 2. The optimum for w is 


fork = 1:w= .868, hence w+ 4 = 1.368; 
fork = 2:w = 1.864, hence w + 4 = 2.364. 


S. Ura extended the calculations to the cases k = 3, 4 and 5. His value & 
= 18.3 is slightly different from mine. 

F. Once more k = 1. For k = 1, the Poisson law p = ¢ “ = e”” is a good 
approximation to the exact formula 


(50) P = (1 — p)° 


even if n is not large, provided p does not exceed .45. Now the largest value of 
v = p/p entering into our calculations was v, = 1 + w;' = 2.15, so, if po does 
not exceed .2, p will not exceed .43 and the Poisson approximation is justified. 
Hence if t lies between 2 and 18, and po does not exceed .2, the results fork = 1 
obtained by the Poisson approximation may be applied without any modification 
even to small samples. 

If n, as computed by this method, is less than 4, a direct calculation of the 
maximum regret M for n = 1, 2, 3, 4 is necessary. Also, if po exceeds .2, we 
have to apply the exact binomial distribution. The exact formulas for the re- 
grets R, and R_ are 


R, = (a — b)(p — po)g” + nf, 
R_ = (a — b)(po — p)(1 — gq") + nf. 


The maxima M, and M_ are readily found by differentiation with respect 
to p. Very often, the computation of M_ is not necessary. M_ being obviously 
less than M,. The maximum of M, and M_ is M, . Thus, we may calculate 
the sequence M,, M,, --- . As soon as we find an M,,, = M,, we may stop 
the calculation and take n as the best value. If n turns out to be large, we may 
replace n by a continuous variable and differentiate with respect to n. 


4. The buyer’s problem. 

A. The loss functions. Suppose a buyer gets a product in lots from the pro- 
ducer. If he accepts a lot, he pays a fixed price and uses the product for his 
own purposes or sells it to others, making a fixed profit minus a loss proportional 
to the number of defectives. The fixed profit does not enter into our calculations; 
we are only concerned with the loss due to defective units, viz. 


(51) L’ = ap. 


If the buyer rejects the lot, he returns it te) the producer. In this case, he misses 


his profit and may have to pay for the transportation, which means that he has 
a fixed loss 





380 B. L. VAN DER WAERDEN 


(52) L’ =c 


in the case of rejection. 


To both losses, (51) and (52), the cost of inspection fn must be added. Hence 
the expected loss is 


(53) L = PL’ + QL” + fn = apP + cQ + fn. 


Comparing (51) and (52) with (1) and (2), we see that the minimum loss 
problem is the same as before, only with b = 0. 


Po 
Fie. 5 


B. The cost per non-defective unit. The expressions (51) and (52) represent 
the buyer’s loss for every lot he has ordered. However, what really matters is 


his cost per non-defective unit. We may assume that defective units are worth- 
less to him, and that every year he needs a certain number of non-defective 
units. Now what is the average price he has to pay for these? 

Following Hamaker [4], I shall use the following notation: 


f- = §[QdF(p) = average fraction of lots rejected, 

fea = [p Pd F(p) = average fraction defective accepted, 

fa = [(1 — p)Pd F(p) = average fraction effective accepted, 
N = number of units in a lot. 


All integrals are from 0 to 1. 
The average number of non-defectives accepted per lot is 


(54) Nfa = N(1 — fe — faa). 


Let the price to be paid for an accepted lot be a, and the cost of a rejected lot 
c. The average cost of a lot is 


(55) (1-f,at+fet+fn=a-—fi(a—c) + fn. 
Hence the average cost of a non-defective unit is 


(56) K = [a — fa — c) + fn\/N(1 — f, — fas). 





SAMPLING INSPECTION AND MINIMUM LOSS 381 
We may assume f, and f,, to be small as compared with i, and fn small as 

compared with a. Neglecting powers and products of small terms, we obtain 

(57) NK =a(1 —f((a—c)/a) + (fn/a) + fe + faa) = a + feed + fc + Jn. 


In order to minimize K for given N, we have to minimize the expression 


(58) fats + fe+fn= [Capp + 0Q + jn) dF(p). 


Now this is just the expectation of the loss L of equation (53) 
(59) E(L) = [LaF(p). 


Hence, no matter whether we start with the loss per lot or with the loss per 
non-defective unit, the buyer’s minimum loss problem is just the same as the 
producer’s, only with b = 0. Hence, the minimum regret solutions, obtained in 
Section 2 and Section 3 for non-sequential tests, may be applied without any 
modification to the buyer’s problem. 

The method of approximation used in passing from the fractional expression 
(56) to the linear expression (57) is due to Hamaker [4]. The same method can 
be applied whenever factors such as 1 — fs. or 1 — f, — fa occur in the denomi- 
nator of a cost formula. As an example, Horsnell’s cost formula for non-destruc- 
tive inspection({7], Formula (1)) may be mentioned. 

C. Sequential tests. For sequential tests there is an essential difference between 
the producer’s and the buyer’s loss function. If the buyer finds many defectives, 
his best action is to stop sampling and to return the product to the producer. 
But if the producer finds many defectives, and if inspection is not destructive, 
he may go on sampling without increasing his loss, because he has to inspect the 
whole lot and to eliminate the defectives anyhow. 

Therefore, a producer’s sequential test will have an acceptance line but no 
rejection line. As long as we do not cross the acceptance line, sampling goes on 
until the whole lot is inspected. The sampling plan is defined by an increasing 
sequence of numbers mp < nm < m --- . The lot is accepted when a sample of 
N contains no defectives, or when a sample of mn, contains just 1 defective, etc. 
Anscombe [8] investigated sequential sampling plans of this type, in which the 
numbers n, form an arithmetical sequence 


(60) Thy = mq + kd. 


Quite generally, let P, be the probability of acceptance with k defectives in 
a sample of n, . The loss in this case would be 


(61) Tn = ap + fm, 
and, in the case of 100% inspection, 


(62) L’ = bp +c. 





382 B. L. VAN DER WAERDEN 


For given p, the loss expectation is 
(63) L= PL, + QL’ = apP + > fmPs + (bp + c)Q. 


The problem is, once more, to minimize the loss expectation 


(64) B(L) = [ LaF(p) 


or, if F(p) is not known, to minimize the maximum of the regret 
(65) R=L-— Lz. 


For the case of an arithmetical sequence (60), good approximations to P, Q 
and > mP, are obtainable from Wald’s theory [10]. I do not know whether an 
arithmetical sequence is most economical. 

In the case of a buyer who is entitled to return rejected lots, a sequential test 
may have an acceptance line and a rejection line. The loss function is 


(66) L = apP + cQ + fE(n). 


The lot size will be assumed to be large as compared with E(n), so that in all 
probability a decision is reached long before the lot is exhausted. If the accept- 
ance and rejection lines are straight and parallel, we have a Wald probability 
ratio test. I am inclined to believe that the minimum regret tests corresponding 
to the loss function (66) are just probability ratio tests, but I cannot prove it. 
It would be very interesting to calculate the parameters of these tests as func- 
tions of a, ¢ and f. 

If the lot size is not very large as compared with E(n), we have to use, in 
working out a probability ratio test, the hypergeometric distribution instead of 
the binomial one; see [9]. 


5. Application of game theory. Until now, we have considered the producer’s 
and the buyer’s problem separately. However, we may combine the two points 
of view and regard the whole transaction as a two-person non-zero sum game. 

A. The rules of the game. The producer may inspect a sample and may decide 
to examine the whole lot and to eliminate the defectives, or he may send the 
product as it is. 

The buyer may inspect a sample and may return the product as soon as he 
finds at least one defective, or he may accept the product and use it or sell it 
to others. 

The producer pays a constant production cost and gets a fixed price if the 
product is accepted. If his own inspection does not accept it, he is still able to 
make the fixed price minus bp + c. If the buyer returns the lot, the producer 
gets the fixed price minus bp + c + c’. In any case, he has to pay fn for the 
inspection of a sample of n. 

The buyer pays the fixed price if he accepts the sample. By using or selling 
it himself, he makes a fixed profit minus ap. If he returns the product, he has, 





SAMPLING INSPECTION AND MINIMUM LOSS 383 


to pay c”’ for cost of transportation. Besides, he has to pay fm, for the inspection 
of a sample of m . 

B. The three players and their strategies. In order to apply the theory of games, 
we may introduce “Nature” as a third player, whose profit or loss is such that 
the sum of the three profits is zero. Nature may influence the defective fraction 
p by causing accidents in the production process. The producer can do nothing 
against this except repairing the damage, so that next day the odds for a new 
accident are exactly what they were before. 

So a “move” of Nature means a value of p, valid one day. The difference 
between Nature and the other players is that Nature is not interested in in- 
creasing her profit. Nature’s only “strategy” is to produce random numbers p 
according to a distribution function F(p). 

A strategy of the producer or buyer means an inspection plan. The buyer’s 
best strategy has been investigated already in Section 4. If the buyer knows by 
previous experience the distribution G(p) of the p-values the producer sends 
him, he may find an inspection plan which minimizes the expectation value of 
his loss 


(67) E(L) = / L(p) dQ(q). 


If the function G(p) is not known, the buyer may adopt the minimax regret 
strategy. 

The producer’s strategy has to be considered anew, because the rules of the 
game are more complicated than those adopted in Section 1. The formulas for 
the producer’s loss if his own inspection rejects the product are the same as in 
Section 1. On the other hand, if the product is sent to the buyer, the producer’s 
profit or loss depends on the defective fraction p and on the buyer’s strategy. 
Even if we assume the buyer’s strategy to be known, the producer’s loss expec- 
tation will be a rather complicated expression. The assumption made in Sec- 
tion 1 that the loss expectation is proportional to p, may be a useful 
approximation. 

C. The possibility of coalitions. Until now, we have investigated the strategy 
of the three players separately. Next we have to consider the possibility of two 
players forming a coalition with the aim of making their joint profit as large as 
possible. 

Of course, a coalition of one of the thinking players with Nature with the aim 
of ruining the other one makes no sense, but the producer and the buyer may 
well agree upon a combined sampling plan which would maximize the sum of 
their profits. 

By combining the two sampling inspections into one, the two partners can 
avoid the additional losses c’ and c” which arise when tlie product is sent and 
returned to the producer. This means: The inspection has to be made only at 
the producer’s, e.g. by a neutral agent. If n is the total size of the sample, the 
combined loss of the producer and the buyer is 





B. L. VAN DER WAERDEN 


L, = ap + fn, _ if accepted, 


and 
I, = bp +c, if rejected. 


Now this combined loss function is exactly the same as the producer’s loss 
function adopted in Section 1. Hence we may apply the theory developed in 
Sections 1-3 to find the best strategy of the coalition. 


REFERENCES 
1. 8S. Monicutt, “Notes on sampling inspection plans,’’ Reports of Statistical Application 
Research, Union of Japanese Scientists and Engineers, Vol. 3 (1955), pp. 99-121. 
2. 8. Ura, ‘Minimax approach to a single sampling inspection,’’ Reports of Statistical Ap- 
plication Research, Union of Japanese Scientists and Engineers, Vol. 3 (1955), 
pp. 140-148. 
. L. Wersuty, “A method of determining inspection plans on an economic basis,” Proc. 
Internat. Stat. Conf., India, 1951. 
. H. C. Hamaker, “Economic principles in industrial sampling,’’ Bull. Internat. Stat. 
Inst., Vol. 33 (1951), p. 105. 
. F. J. Anscompe, “‘The cost of inspection,’’ Statistical Method in Industrial Production, 
Sheffield Conference, Roy. Stat. Soc., 1950, pp. 61-67. 
. C. Hamaker, ‘‘Some basic principles of sampling inspection by attributes,"’ Applied 
Stat., Vol. 7 (1958), pp. 149-159. 
3. Horsne.t, ‘Economical acceptance sampling schemes,’’ J. Roy. Stat. Soc., Vol. 120 
(1957), pp. 148-191. 
8. F. J. Anscomsg, ‘Linear sequential rectifying inspection for controlling fraction de- 
fective,’ J. Roy. Stat. Soc., Suppl., Vol. 8 (1946), pp. 216-222. 
9. A. Hap, ‘‘The compound hypergeometric distribution and single sampling inspection 
plans based on costs,’’ accepted for publication in Technometrics. 
10. A. WALD, “Sequential tests of statistical hypotheses,” Ann. Math. Stat., Vol. 16 (1945), 
pp. 117-186. Wald’s method’s may be applied also in the absence of a rejection 
line (limiting case a=0 in Wald’s notation). 





SIMPLIFIED ESTIMATION FROM CENSORED NORMAL SAMPLES 


By W. J. Drxon 
University of California, Los Angeles 


0. Summary. Estimators of mean and standard deviation for censored normal 
samples which are based on linear systematic statistics and which use simple 
coefficients are almost as efficient as estimators using the best possible coeffi- 
cients. Estimators are given for samples of size N < 20 for censoring at one 
extreme and for several types of censoring at both extremes. 


1. Introduction. A censored sample is a sample lacking one or more observa- 
tions at either or both extremes with the number and positions of the missing 
observations known. Censoring may take place naturally i.e., an observation 
has a magnitude known only to be more extreme than the other observations in 
the sample. Censoring may also be imposed by the experimenter who from past 
experience knows that extreme observations are so unreliable that their magni- 
tudes should not be used as observed. The experimenter may impose censoring 
to reduce the duration of an experiment and obtain estimates before the extreme 
cases are determined. Estimation of the mean and standard deviation of a normal 
distribution from a sample which is censored has been considered by Sarhan and 
Greenberg [1], who obtained coefficients for best linear systematic statistics. 
They also record efficiencies of these estimators compared to the case of no 
censoring. Winsor [4] and perhaps others have suggested using for the magnitude 
of an extreme, poorly known, or unknown observation the magnitude of the 
next largest (or smallest) observation. We shall show that when symmetry is 
maintained (or proper adjustment is made) this practice results in estimators of 
the mean whose efficiencies are scarcely distinguishable from those of best linear 
estimators. For non-symmetrical censoring, it is demunstrated that optimum 
simple estimators of the mean result from these ““‘Winsorized” estimators. Also 
presented are estimators of the standard deviations using one or two ranges (not 
necessarily symmetrical) which have efficiency .94 or greater when compared 
with the best linear systematic statistics. 

The variances of the proposed estimators were computed from an original 21 
decimal tabulation of the means variances and covariances of the order statistics 
made available by Dan Teichroew. These tables are described in reference [5]. 
The efficiencies are the ratios of variances of corresponding estimators given by 
Sarhan and Greenberg [1]. 


2. Symmetrical censoring. Estimation of mean. If natural or imposed censor- 
ing of the sample results in the same number of observations censored from each 
extreme of the sample the practice of using for each missing observation the 
magnitude of its nearest neighbor whose magnitude is known has a minimum 


Received August 24, 1959; revised January 4, 1960. 
385 





386 W. J. DIXON 


TABLE I 
Relative efficiency for estimate my» compared with best linear systematic statistic, when 
censoring involves i — 1 observations at one extreme and i observation at the other extreme. 
) Snes iiinsaiedlscirilicaicastneaaenbeaibanielshcoenapecacinansiaraeptaianie 


1 | 2 5 6 


| 

| 

1.000 
974 


| .970 
| 971 
| 


2 228 


© © 
a+) 
x) 





22422 22 


-973 


975 

| 977 
: pichey | .979 
96 |. 981 
987 984 | 982 


-990 
-991 
-991 
-992 
- 992 


relative efficiency of .99912 (this occurs for N = 20,71 = 4) when compared with 
the best linear systematic statistic, BLSS, as given by Sarhan and Greenberg [1] 
for N S$ 20. For i observations censored at each extreme this estimator is 


mw = [(i + 1) tins + tigen +++ + tye + (i + 1) tw] /N 


Efficiency is defined here as the ratio of Var (BLSS)/Var (mw). Table III of 
reference [la] and Table II of reference [1b] may be used for Var (my) to three 
or four figures of accuracy since the efficiency is virtually 1.000 for all cases of 
symmetrical censoring for N S 20. 


3. Almost symmetrical censoring. Estimation of mean. If one more observa- 
tion is censored from one extreme than from the other extreme one may consider 
the simple procedure of dropping another observation to symmetrize censorship 
and proceed as in Section 2. Efficiencies of the resulting estimators compared with 
BLSS are given in Table I. For each 7, the efficiencies first decrease and then in- 
crease with increasing N and the minimum increases with 7 from .962 fori = 1 
for N S 20 andi S 6. It therefore seems reasonable to assume that the effi- 
ciency is never less than .962. In the example of reference [1] and [2] for the sam- 
ple of ten 


——, ——, 106, 111, 119, 121, 125, 





ESTIMATION FROM NORMAL SAMPLES 








23 
33 
33 
4 
i 
i 
i 
iF 
: 
i 
§ 
I 
‘S. 

$3 
a 
38 
T 





the estimate m,. 





388 W. J. DIXON 


the BLSS estimate of mean is 118.9. The estimate 
mw = [4(111) + 119 + 121 + 4(125)]/10 = 118.4. 


4. Censoring entirely at one extreme. Estimation of mean. If i observations 
are censored at one extreme, one may consider dropping i observations at the 
other extreme to produce symmetry and proceed as in Section 2. For i S 6 the 
efficiency of this estimator is never less than .956. Since the efficiencies for each 
t S 6 are increasing at N = 20 it seems reasonable to assume this minimum 
holds for all N with ¢ S 6. If fewer observations are dropped, some adjustment 
must be made to maintain an unbiased estimator. A simple estimator which 
usually has greater efficiency is 


Mq = [ax + te + +++ + Spin + (4 + 1)ty_J/(N + a — 1) 


Here a is chosen as a coefficient of 2; , i.e. chosen to satisfy Z(m,) = uw and the 
other extreme is ‘“Winsorized”’ as in the estimator my . If 7 is not large m, shows 
very little loss in efficiency from the BLSS, and of course it is possible to estimate 
the mean for smaller sample sizes than is possible if one arbitrarily makes the 


TABLE III 


Relative efficiencies of estimates based on ranges of samples compared with best linear 
systematic statistic for estimating standard deviation from samples censored of i observa- 
tions at each extreme. Estimate is maximum range except where noted. 


i 
1 2 3 
N | 


| 


1.000 | 
1.000 


.997 1.000 

991 1.000 

984 .999 .000 

975 .997 .000 

966 | .993 | 1.000 | 1.000 


.966* | .989 .998 

. 969" .984 .997 1.000 
.969* .979 .994 999 
.968* | .973 .992 .998 
.966* .967 .989 .997 


.967** .967* 985 995 
-968** .967* -981 993 
.968** .967* 977 991 
.968** .966* .973 .988 
-966** -965** .969 986 


* Efficiency for estimate based on (ay_i — Zia) + (nw -i-1 — Zig2)- 
** Efficiency for estimate based on (zw_i — 2is1) + (2Nw-i-2 — Zias)- 





ESTIMATION FROM NORMAL SAMPLES 389 


TABLE IV 


Relative efficiencies of estimates based on ranges of samples compared with best linear 
systematic statistic for estimates of standard deviation from samples censored for i — 1 
observations at one extreme and 7 observations at the other extreme. Estimate is based on 
maximum range except where noted. Efficiencies and estimates for i = 1 as given in Table V. 


2 4 6 


1.000 
998 
-995 
.990 


977 


—_ 


.973* 
-972* 
.969* 
.965* 


_—_ 
~_ — 


($8882 832 


.966** 

.967°* .972° ; 
.967°* .970* 983 
.967°* 968" -980 
.965** 965° -976 


($2285 82222 


} 


* Estimate is based on (zw_; — z;) + (tw-i — iss). 
** Estimate is based on (zw; — 21) + (2-1-1 — igs). 


sample symmetric and uses mw» as suggested above. Table II lists the efficiencies 
for these two types of estimators and lists the values of a for the estimator m, . 


5. Estimation of standard deviation. Symmetrical censoring. Any estimator 
of the standard deviation based on a sample whose extremes are censored has low 
efficiency since the observations of greatest importance are not available. For 
example if one extreme observation in a sample of 10 is missing the BLSS has 
efficiency .837 compared with the sample standard deviation based on all ten 
observations; for one extreme observation censored from a sample of five the 
efficiency is .677. Furthermore, the situation rapidly deteriorates for more obser- 
vations censored. It seems of interest to investigate whether an estimate of 
standard deviation based on ranges will more than slightly depress these effi- 
ciencies. 

For i observations censored from each extreme an estimate of the standard 
deviation based on an optimum choice of one or two ranges has minimum rela- 
tive efficiency .965 compared with the BLSS for i S 6 and N S 20. Table III 
indicates these estimators and efficiencies. For similar estimators for the case of 
no censoring see [3]. This table and also Tables IV and V indicate the range or 





390 W. J. DIXON 


ranges to be used for the optimum estimator of this type. An appropriate multi- 
plier must be used to give an unbiased estimate of the standard deviation. Table 
III indicates the use of two ranges in certain cases. The increase in efficiency for 
two ranges can be seen by comparison with the efficiency of the maximum range 
alone which for i = 1 is .916 for N = 15 and 868 for N = 20; and fori = 2 is 
.936 for N = 20. 


6. Almost symmetrical censoring. Estimation of standard deviation. For i — 1 
observations censored at one extreme and i observations censored at the other 
extreme an estimator of the standard deviation based on an optimum choice of 


TABLE V 


Relative efficiencies of estimates based on ranges compared with best linear systematic 
statistic for estimating standard deviation from samples censored of i observations, at 
upper extreme (for lower extreme replace z; by zw41-:). 





N 


.N-3 | 1N-4 IN-5 1N-6 


1.00 | 6 1.000 | 7 1.00 | 8 1.000 
975 | 7 971 | 8 967 


1,2, -6,N-6 
1,2, N — 2, N — 21,2, N—3,N —3]1,2,N—4,N—411,2,N-5,N-5 9 968 
6 .968 7 -974 8 .977 9 -979 10 -981 
6 ‘ -981 8 -983 9 .984 10 -984 ll -983 
7 ; .986 9 .986 | 10 984 11 -982 | 12 -980 
8 ‘ .986 | 10 -985 | 11 981 12 .978 13 .974 
9 ‘ .983 | 11 -980 | 12 .976 
wey 977 |12 .974 1,3, N — 5, N — 81,3, N — 6,N —6 
.970 
1,2,N—2,N-1 1,3,N-3,N—3) 13 974 





ll .963 1,3, N — 2, N — 2) 13 973 | 14 973 
13 -966 | 14 971 | 15 -970 
1,3, N — 2,N — 1) 14 963 | 15 -968 | 16 -967 
-964 15 958 | 16 -963 | 17 . 962 
946 17 -958 | 18 -957 
-962 |1,3,N —3,N —2) 18 -953 | 19 952 
958 : 19 -947 | 20 -947 

-954 : 20 -941 

949 


1,3,N -3,N—1 
18 945 
19 -941 


20 .938 





ESTIMATION FROM NORMAL SAMPLES 391 


one or two ranges has minimum efficiency .965 compared to BLSS for 1 < i S 6 
and N S 20. These estimators and efficiencies are given in Table IV. Table IV 
indicates the use of two ranges for certain cases. The increase in efficiency for 
two ranges can be seen by comparison with the efficiency of the maximum range 
alone which for i = 1 is .937 for N = 15 and 896 for N = 20; and fori = 2 
is .950 for N = 20. 


7. Censoring entirely at one extreme. Estimation of standard deviation. For i 
observations censored at one extreme an estimator of the standard deviation 
based on an optimum choice of one or two ranges has minimum efficiency .937 
compared to BLSS for i S 6 and N < 20. These estimators and efficiencies are 
given in Table V. The estimators are indicated by the order of the observations 
used in the estimator. For example, the designation of 1,3, N — 5,N — 5forN = 
15 indicates the estimator K(2z2:. — 2; — 2) where K = E(22 — 23 — 2) 
and the expectation applies to the unit normal table. For this example, K' = 
2(.33530) + .94769 + 1.73591 = 3.35420. The optimum solution for most cases 
requires the use of an extreme observation at the censored end with doubled 
weight rather than two different observations. 


REFERENCES 


[1] Aumep E. Sarwan anv Bernarp G. Greenserc, “Estimation of location and scale 
parameters by order statistics from singly and doubly censored samples.”’ Ann. 
Math. Stat., (1) Vol. 27 (1956) pp. 427-451 and (II) Vol. 29 (1958) pp. 79-105. 

[2] Bernarp G. GREENBERG AND AnMED E. Sarnan, “Applications of order statistics to 


health data,’”’ Amer. J. of Public Health, Vol. 48 (1958) pp. 1388-94. 

(3) W. J. Drxon, “Estimates of the mean and standard deviation of a normal population,” 
Ann. Math. Stat., Vol. 28 (1957) pp. 806-809. 

[4) Cuaries P. Winsor, Personal communication. 

[5] D. Tz1curoew, ‘‘Tables of expected values of order statistics and products of order 
statistics for samples of size twenty and less from the normal distribution,” 
Ann. Math. Stat. Vol. 27, (1956) pp. 410-426. 





ESTIMATING THE MEAN OF A FINITE POPULATION 
By J. Roy anp I. M. CHAKRAVARTI 
Indian Statistical Institute, Calcutta 


0. Introduction and summary. In sampling from a finite population, the non- 
existence of a uniformly minimum variance unbiased estimator for the mean yu 
has been demonstrated by Godambe [3], and the inadmissibility of the sample 
mean as an estimator for 4, when sampling is with replacement and equal proba- 
bilities, has been proved by Des Raj and Khamis [2] and by Basu [1]. 

In this paper, the problem of unbiased linear estimation of u with minimum 
variance is considered for a very general scheme of sampling. An admissible 
estimator is obtained, together with a complete class of estimators. It is shown 
further that, for a somewhat restricted sampling scheme, amongst estimators 
with variance proportional to o’, there does exist a best estimator which, in the 
case of sampling with replacement and equal probabilities, is the same as that 
considered in [1] and [2]. 


1. Sampling scheme and method of estimation. Consider a population con- 
sisting of a finite number N of distinguishable elementary units u,; with asso- 
ciated real numbers (variate-values) y;, i = 1, 2, --- , N. The mean and the 
variance of the population will be denoted respectively by 


(1.1) w= N "Dy and & =N'*D) (ys — 2)’. 
t=1 t-) 


Let {U} denote a countable collection of finite or infinite sequences U(z), 
x = 1,2, --- , of the elementary units, repetitions being allowed. We shall call 
each U(z) a “sampling unit’”’. Let n,(z) denote the number of times u; occurs 
in U(x) and let 
(1.2) v(z) = O(1) if nx) = 0(>0). 

To avoid triviality, it will be assumed that there is no sampling unit which con- 
tains all the N different elementary units. 

The sampling scheme to be considered is as follows: Only one of the sampling 
units is to be selected, the probability of selecting U(x) being p(z) so that 
> p(x) = 1 (summation being over all sampling units), and the variate-values 
for all the elementary units in the selected sampling unit are to be determined. 
The total number of elementary units in U(2z), counting repetitions, is thus 
n(x) = em n(x), and the number of distinct elementary units in U(z) is 


(1.3) r(x) = 2 v,(z). 


The serial number of the selected sampling unit is thus a random variable X 
with probability distribution given by 
Received March 12, 1959; revised October 6, 1959. 
392 





MEAN OF A FINITE POPULATION 


(1.4) Prob (X = z) = p(z), 


It is to be noted that the sampling scheme considered is of a very general 
type; n(x) and v(x) need not be independent of z, and n(z) may not even be 
finite. This kind of formulation is useful because it covers cases of sequential 
sampling. Consider, for instance, the following scheme of sampling: Draw 
elementary units, one by one, with replacement, until two different elementary 
units are obtained, where the probability of getting a particular unit may vary 
from draw to draw. The sample size counting repetitions may be infinite, though 
the effective sample size is only two. Our complete class Theorem 2.2 shows that 
in this case, to estimate the mean, one may disregard the multiplicities and the 
order of drawing of the two elementary units. 

If U(z) happens to be selected, a linear function, call it t(z), of the variate- 
values for all the elementary units in U(z), will be taken as the estimate for yu. 
In general, {(z) can be written as t(z) = Due ya,(x), where a(x) (¢ = 1, 
2, ,N;2z = 1, 2, ) are pre-determined real numbers with the 
restriction that a;(z) = 0 whenever n,(z) = 0. The estimator is thus the random 
variable 


(1.5) T=«(X)= 2D, yeas(X). 


In order that the expectation of JT may be equal to yu for all values of y = (mw, 
Y2, °°: , Yw) a necessary and sufficient condition is that 


(1.6) ElaX)| = N™, i=1,2,---,N. 


The further restriction Ela,(X)}' < «,i = 1,2, --- , N, is imposed so that the 
variance of 7 may be finite for all finite values of y. A random variable T satisfy- 
ing these conditions will be called a linear unbiased estimator of y. 

Obviously (1.6) cannot hold unless for every i (i = 1, 2, --- , N) there exists 
at least one z for which both n,(z) > 0 and p(x) > 0; henceforth this will be 
tacitly assumed. (Any u,’s for which n,(z)p(z) = 0 for all z are effevtively 
outside of the sampled population. ) 

The variance of a linear unbiased estimator T of yu is obviously given by 


(1.7) V(T) = 2 DL vurs 


where 6;; = Cov [a(X), a(X)] = Ela(X)a(X)} — N~. 

2. An admissible estimator and a complete class of estimators. Of two dif- 
ferent linear unbiased estimators T and T’ of u, T will be said to be at least as 
good as T” if 
(2.1) V(T) = V(T") 


holds for all y; T will be said to be better than T”’ if (2.1) holds for all values of 
y with strict inequality for at least one value of y. In a given class of linear un- 





394 J. ROY AND I. M. CHAKRAVARTI 


biased estimators of yu, 7' will be said to be best if it belongs to the class and is 
better than any other member of the class; it will be said to be admissible if the 
class does not contain a better member. A class € of linear unbiased estimators 
of uw will be called complete, if given any linear unbiased estimator of u not be- 
longing to the class € it is possible to find a member of € which is better. 

It has been shown [3] that a best estimator in the class of all linear unbiased 
estimators does not exist for any sampling scheme. An admissible estimator and 
a complete class of estimators are obtained in this section. 

Let a: (x) be defined by 


* v(x) 
(2.2) ay (x) Na ,’ 
where »,(z) is defined by (1.2) and gq; stands for the probability that the ele- 
mentary unit u; occurs in the selected sampling unit, that is g; = E[»(X)]. 
Consider 


N 
(2.3) T* = > ya; (X), 
tml 


which is easily verified to be a linear unbiased estimator of u. The variance of T* 
is given by 


N N 


(2.4) V(T*) = DD vw sdis 


where N’8?; = (qi;/q.q;) — 1 where q:; stands for the probability that both 
u,; and u,; occur in the selected sampling unit; that is, g,; = E[»i(X)»;(X)], 
Gi = %- 

Tuerorem 2.1 7* defined by (2.3) is admissible in the class of all linear unbiased 
estimators of wu. 

Proor. If not, there exists a better linear unbiased estimator of u, say T' given 
by (1.5). Then, from (1.7) and (2.4) one gets 


(2.5) V(T*) — V(T) = 2 2 ysis — 8:5) 


which must be at least positive-semidefinite. But it is easy to verify that 6% — 
8; = —Elat(X) — a,(X)f is not positive: this contradicts the assumption 
that T is better. 

To obtain a complete class of estimates proceed as follows. Let J = (j:, 
je, *** jm) denote a non-empty proper subset of the set of integers (1, 2, --- 
--» | N). There are thus 2” — 2 such subsets. Let S, stand for the set of the 
serial numbers z of those sampling units U(x) which contain the elementary 
units u;, , Uj, , °** , Us, and these only, thus 


(2.6) S,; = {ziv(z) =1(0) for te J(izeJ)}. 





MEAN OF A FINITE POPULATION 395 


Let Co denote the class of linear unbiased estimators T of » for which the coeffi- 
cients a;(z) are equal for all z e S, and for every subset J, that is, they are of 
the form: a;(z) = b,,; for all z e S, and for every subset J. Thus, @» is the class 
of linear unbiased estimators of yu, whose coefficients depend only on which 


elementary units are in the sampling unit, and not on their multiplicities or 
ordering. 


We then have the following: 

THEOREM 2.2 The class Cy is complete. 

Proor. Let T = >-%, ya:(X) be a linear unbiased estimate of ». Let x, = 
Prob (X e¢ S,) and define 


(2.7) bis = Doves, ai(z)p(z)/e, if ws > 0, by = 0 otherwise 


and further let 4;(z) = b,, for all z e S, and for every subset J. It is easy to see 
that the estimator 7 = 5°%_, ya,(X) belongs to the class @, . Also, 


(2.8) V(T) = V(T) + 2d 2 Yiysras 


where Ai; = Elja(X) — 4(X)} {a;s(X) — a4,;(X)}). Since the matrix ((A,;)) 
is at least positive-semidefinite, 7 is better than 7 unless T itself belongs to 
@,. This completes the proof. 


3. Best estimator in a restricted class. Since there does not exist a best mem- 
ber in the class of all unbiased linear estimators we proceed to examine whether 
a best estimator exists if the class is suitably restricted. 

A linear unbiased estimator 7 = t(X) will be called linearly invariant if the 
transformation yf = ay; + 8 (i = 1,2,--- ,N) of the variate values transforms 
t(x) to t*(x) where *(z) = at(x) + 8 for all z for which p(x) > 0. Obviously, a 
necessary and sufficient condition for T to be linearly invariant is that 


(3.1) a(z) = 1, for all z for which p(x) > 0. 


We now show by a counter-example that, even in the class of linearly invariant 
unbiased estimators, in general there does not exist a best estimator. 

Consider a population of N = 4 elementary units u, with variate-values 
yi (t = 1, 2, 3, 4). Let the sampling units be U(1) = [m, mm, wj, U(2) = 
[uy , Ue, Us), UCB) = [uy , ue, we] amd U(4) = [uy , uy, us], and let the prob- 
ability of selection be the same, viz. } for all the sampling units. This corre- 
sponds to taking 3 units with equal probabilities without replacement from a 
population of 4 units. It follows from Theorem 2.1 that the sample mean 7* 
is an admissible estimator in this case. Obviously, 7* is linearly invariant and 
its variance is o°/9 where o = > $.; (y; — »)*/4. Consider now an alternative 
estimate, T = }-_: y.ai(X), whose coefficient-matrix {a,(z)} (i, z = 1, 2, 3, 4) 
is given on the following page. 





J. ROY AND I. M. CHAKRAVARTI 


3 








It is easy to verify that T is linearly invariant and unbiased, and that its vari- 
ance, $o° + 46°(y: — ys)” + $6(y: — ye) (ys — ys), can be made smaller than the 
variance of 7* by a proper choice of @ if y, ~ yz. Therefore a best invariant 
estimator does not exist in this case. 

It will now be shown that, if consideration is limited to a still smaller class of 
what we propose to call regular estimators, there does exist a best estimator, 
provided that the sampling scheme is somewhat restricted. 

A linear unbiased estimator 7 of u will be called a regular estimator if its 
variance is of the form 


(3.2) V(T) = ko’, 


where k is a constant independent of y. Suppose that T is of the form (1.5) so 
that its variance is given by (1.7). Since (3.2) can be written as 


(3.3) V(T) = KN —1)N* Do yi — kN? DE Dis, 


tytj=l 
by equating coefficients in (1.7) and (3.3) we get 
[kN * — (k — 1)N~* ifi = j 
(3.4) Ela,(X)a;(X)] = al 
\—(k — 1)N if i # j. 


Consequently, writing a(X) = mn a,(X), one gets from (3.4), V[a(X)] = 0. 
Therefore 


N 
(3.5) > a(x) = 1 for all x for which p(t) > 0. 
t—1 


We thus have 

TuHeoreM 3.1. A regular estimator is linearly invariant. 

Let us now compute M = ED fa(X) — v(X)/v(X)f. By virtue of 
(3.5), we get Ditaala(X) — v(X)/o(X)P = Liver la(X)/ — 1/r(X). 
Using (3.4) we then have 
(3.6) M =k(N —1)N* + N™ — Z{l/v(X))}. 


Since M is non-negative, we obtain the following: 
Lemma. For the variance of any regular estimator T of yu, there exists a lower 
bound V(T) = Ko’ where 


(3.7) K = (N/(N — 1))E{1/»(X)] — (1/(N — 1)). 





MEAN OF A FINITE POPULATION 397 


This lower bound can be attained if and only if M = 0. But this requires that, 
for every z for which p(z) > 0, a;(z) = A(X), where 
(3.8) Adz) = v(z)/r(z). 
In order that the linear statistic 


(3.9) ite > yd X) 


may be an unbiased estimator of u, a necessary and sufficient condition is that 
(3.10) Elv,(X)/»(X)] = NO fori = 1,2, +++ ,N. 


A sampling scheme will be called balanced if (3.10) holds. 

We thus have proved the following: 

THEOREM 3.2. In order that the lower bound for the variance of a regular estimator 
of u may be attained, a necessary and sufficient condition is that the sampling scheme 
should be balanced. If the sampling scheme is balanced, the estimator L defined by 
(3.9) is best in the class of all regular estimators and its variance is given by V(L) = 
Ko* where K is defined by (3.7). 


4. Application to specific sampling schemes. The usefulness of the theorems 
derived in Sections 2 and 3 will be demonstrated by considering several well 
known sampling schemes. 

4.1 Simple Random Sampling: In this case, a sample of n elementary units 
is drawn one by one with equal probabilities and with replacement. There are 
thus N* sampling units, each consisting of n of the N elementary units, repeti- 
tions being allowed. The probability of selecting any one sampling unit is N~*. 
That the sample mean 7’ = mt ym,(X)/n is an inadmissible estimator follows 
from the complete class Theorem 2.2. This result was obtained earlier in [1] and 
[2] by proving that the estimator To = > 7; yw(X)/»(X) is better. From 
Theorem 3.2 we have the stronger result that 7 is the best regulator estimator. 
An admissible estimator in this case is 


N 

DL ur(X) 
~ Nil — 0 = G79)" 
as obtained from Theorem 2.1. This estimator was used in [3] to prove that the 
sample mean 7% is not better than 7T*. However, 7* is not even a linearly in- 
variant estimator. 

4.2 Random Sampling Without Replacement. In this case a sample of n ele- 
mentary units is drawn one by one with equal probabilities but without replace- 
ment. There are thus N!/(N — n)! sampling units, each consisting of a com- 
bination of n of the N elementary units, and each such sampling unit has the 
probability (N — n)!/N! of selection. It is easily seen from Theorems 2.1 and 
3.2 that the sample mean in this case is admissible and best in the regular class. 


a 





398 J. ROY AND I. M. CHAKRAVARTI 


The counter-example in Section 3 however demonstrates that a best invariant 
estimator does not exist in general. 

4.3 Sampling for v Distinct Units. In this case elementary units are drawn one 
by one with equal probabilities and with replacement, until » distinct elementary 
units are drawn, the total sample size being thus a random variable. It is seen 
from Theorem 2.2 that the sample mean 7 = 5 7; ymi(X)/n(X) is inad- 
missible. This was proved in [1] by showing that the estimator 7* = 
>: ywi(X)/v is better. It follows from Theorems 2.1 and 3.1 that 7* is ad- 
missible and best in the regular class. 


REFERENCES 


{1] D. Basu, “On sampling with and without replacement,’”’ Sankhyd, Vol. 20 (1958), pp. 
287-294. 

{2} Des Ras anv H. S. Kuamis, “Some remarks on sampling with replacement,’’ Ann. 
Math. Stat., Vol. 29 (1958), pp. 550-557. 

{3} V. P. Gopamsz, “A unified theory of sampling from finite populations,’ Jour. Roy. 
Stat. Soc. Ser. B, Vol. 17 (1955), pp. 269-277. 





A MEASURE OF PREDICTIVE PRECISION IN REGRESSION ANALYSIS 
By H. Linwart 
South African Council for Scientific and Industrial Research, Johannesburg 


1. Introduction and summary. It has been suggested ([5], [6]) to use the ex- 
pected value, E(l), of the length, l, of a confidence interval for the variable to 
be predicted as a measure of precision of prediction in a regression analysis. 
The measure is relevant only if the predictor variables are random. 

Criticism of this particular choice of a measure of precision usually centres 
about the following questions: 

a. Why is the measure based on this and no other system of confidence inter- 
vals; what is known about optimality of this system. (The system referred to 
will be described in Section 2.) 

b. Why is it based on the physical length of the intervals and not on 
Neyman’s “shortness’’? 

c. If it is agreed that it is based on 1, what justifies the choice of E(/)? 

In the following a few points are raised which are, of course, not sufficient to 
prove that E(1) is the only possible choice for a measure of precision, but which 
indicate that the intuitive choice is not altogether unreasonable. 

Not much can be said about a. It turns out, in Section 3, that the confidence 
limits discussed here are unbiased, but nothing about optimality in any sense is 
known to the author. 

To throw some light on b, Neyman’s shortness of the system of confidence 
intervals used is calculated in Section 3. A parameter enters the problem which 
makes it impossible to use Neyman’s shortness as an cverall measure of precision. 

With regard to c, one may argue that / is a random variable and if a single 
measure of precision is needed a sing!e characteristic of its distribution must be 
used. Obvious possibilities are the mean and the median. The distribution of 1 
is obtained in Section 4, and it becomes apparent that the use of the median 
would involve heavy numerical calculations. 


2. A system of confidence intervals for y,.. In all n + 1, independent vectors 
enter the problem. All vectors have the same (k + 1)-dimensional normal distri- 
bution with mean vector y—here without loss of generality assumed to be the 
zero vector—and with unknown positive definite covariance matrix A. It is as- 
sumed that n > k + 1. The first n vectors, (to, 2, °** , %me),¥ = 1,2,---,n, 
are used to estimate u and A. These n vectors are called “the sample”. Only the 
last k components of the (n + 1)st vector, (yo, ¥:, -** , Ye), are known. Using the 
information on the structure of the distribution of (yo , #: ,--- , Ye), Which was pro- 
vided by the first n vectors, a confidence interval, corresponding to the confidence 
coefficient 1 — a, for a hypothetical future observation y is given. The length 
of the confidence interval is a random variable; its expectation can be used as 


Received December 26, 1958; revised October 22, 1959. 
399 





400 H. LINHART 


measure of precision of prediction. A system of confidence intervals will now be 
described and studied; E(l) for this system will be computed and discussed. 

The following notation will be used: L = (1,;), is the (kK + 1) K (k + 1) 
covariance matrix in the sample, 


nl; = 2, (Zin — 2:)(2je — 23), i,j =0,1,---,k. 


The k X k submatrix of L comprising only the elements 1;; with 7,7 = 1, 2, --- k, 
is denoted by Le and L;’ = (1s’) is written for (Lo). A completely analogous 
notation is used for the covariance matrix in the population A = (A,;). De- 
terminants are written with vertical strokes. The double tailed 100a per cent 
point of Student’s distribution with v degrees of freedom will be denoted by 
t.””. The predicted value of yo is fo = & + >-5-. B(y; — &;), where & and 4, , i = 
1, 2,---, k, are the least square estimates of the regression coefficients ({2], 
p. 552). 

The confidence intervals for yo which are used here are those which are usually 
obtained under the assumption that 2;,, 22,,--- , %» and ti, Ye,°-*, Ye are 
jized (nonrandom) variables (({8], p. 305). Under that assumption, jo — yo is 
N(0,.1 + n+ T)|A{/n| Ao |], where 


k 
(1) T= Dd (w — 4) (ys — & li’; 


t,j=1 
and 
(2) 4 = ML Aol ILI 


: Os? 
| A | Lo “on , 


, 


is, independently of jo — yo , distributed as x’ with n — k — 1 degrees of freedom. 
It follows that 


eee Hol fe ah = 0) 
= 9 0 ee 


has Student’s distribution with n — k — 1 degrees of freedom, and that a con- 
fidence interval for yo is given by 


fp — tee» [4 (1+n+ T) 
ate [lol w—k—1) 


4 
| Sweet” 


(4) 


| Lo |(n — k — 1) 
The distribution of t, (3), is independent of LZ» and 7, and remains therefore 
unchanged if Ly is a random matrix and T a random variable. The probability 
that the intervals (4) cover yo is therefore [3] even in the random case equal to 
1 — a. 
In Section 3 it will be shown that these confidence intervals are unbiased, 
but nothing about optimality in any sense is known to the author. 


[ets } 





PREDICTION IN REGRESSION ANALYSIS 401 


It might also be noted here that the intervals (4) are not confidence intervals 
in the classical sense, as yo is a random variable and net a parameter. Sometimes 
intervals of this kind are called “prediction intervals” or “quasi confidence in- 
tervals’. 


3. The shortness of the confidence intervals used. The shortness, in the sense 
of Neyman, is measured by the probability that the confidence intervals cover 
a value which is different from the true value of the parameter ([9], p. 371). 
One must therefore find the probability that the intervals (4) cover yo + 4, 
where 4 is any constant. 

Recalling how the confidence intervals (4) have been obtained, one may see 
that Jo — yo — 4 is, conditionally, given Lp and T, . 


N(—6, (1 +n + T)|Al/n| Ao |}. 


It follows ({1j, p. 113) that (go — yo — 5)’n|Ao|/(1 + n + T)|A| has the 
noncentral x’-distribution with 1 degree of freedom and noncentrality parameter 


(5) r= s'n|Aol//(lL+n+T) {Al}. 


The square of 


Picea 7 mate, | Io|(n—k—1)} 
" t= —w- 9 eee RI 


has then ({1], p. 114), again conditionally, the noncentral F-distribution with 
fi = land ff, = n — k — 1 degrees of freedom, 


(7) f(t”) = exp {—7*/2} hi - (7°/2)"(f; “4, 


’ fe =O vIB(fi/2 + v, fo/2)(1 + fi 2/fay Orie ’ 
0st”. 
The conditional probability, given 7, that the confidence intervals (4) cover 
yo + 6, or the conditional shortness, is therefore 


[tq(®—k-1)]2 


(8) s(8|T) = [ f(t) ae”. 


The unconditional shortness, s(4), is the expectation of s(4 | 7). It is thus neces- 
sary to obtain at first the density function of T. 

Hsu has shown ([4], p. 235, equation 12); see also ({I1], p. 114)) 
that (n — k)T/k has, conditionally, given (y:, y,°--, Ye), the noncentral 
F-distribution (7) with f; = k and f, = n — k degrees of freedom, and with 
noncentrality parameter r° = 2h, 


(9) A = (n/2 ) 2 yanrs’ . 


The density function of T is the expectation over the variables y; , y2,--- , Ye, 
of the conditional density function. Now the vector (1: , ¥, --- , ye) is N(O, Ao), 





402 H. LINHART 


and it is well known, that 2\/n has a x’-distribution with k degrees of freedom 
({2], p. 319, example 15). It is then easily verified that 


(10) E(x’ exp {—d}) = n'T(k/2 + v)/(n + 1)*? PT (k/2). 
The density function of T is found to be 
(n—k) /2 79k /2—1 


The substitution 
(12) v= (1+n)/(1+n+T), 0svsl, 


shows that v has a Beta-distribution with k and n — k degrees of freedom. 

For the expectation of s(4| 7) with respect to T one needs, as may be seen 
from (7) and (8), the expectation of (7°/2)’ exp {—7+’/2}, where 7’, as given 
by (5), is a multiple of v. Using a well known integral representation for the 
confluent hypergeometric function ([7], p. 87), one obtains easily 


E{(1°/2)" exp { —1°/2}] 


(13) aT {(n — k)/2 + vIP(k/2)T(n/2) pr, py 2 
7  P(n/2 + ») Pin — k)/2 —— F{(n — k)/2 + v;n/2 + »; — al, 


where 
(14) n = &n|Ao|/2(1 + n)\ Al. 
The shortness may then be calculated. It is 


[tqi*~*-1))2 /9 
3(8) = I Os ci ee 
0 (n—k — 1)0[(n — k — 1)/2)\T{(nm — k)/2!} 
(—1)"9 *(t?(n —k —1))* 
TSF ‘Th(n — k)/2 + P(n — k)/2 +4 +4) 12 
2, oil ttea-soee::: dt 
-T(1/2 + v)P(n/2 +r + yp), 


The function s(4) is difficult to study, and, in addition to that, besides 
| A |/| Ao |, k and n, another parameter, 5, enters the problem; it seems impossible 
to base an overall measure of precision of prediction on s(4), unless it is weighted 
by some arbitrary weight function of 6. 

Remembering that s(é) is the expectation of s(6| 7), and realising that 
1 — s(6| 7) is nothing else than the power function of Student’s t-test, one may 
draw some conclusions about the properties of s(6). 

The symmetrical double tailed t-test is unbiased; s(6 | 7) has, therefore, for 
fixed k, n and | A |/| Ao|, for each finite T an absolute maximum at 6 = 0; s(6) 
has, therefore, for fixed k, n and | A |/| Ao |, also a maximum at 6 = 0. The system 
of confidence intervals used is therefore unbiased in Neyman’s sense. 

If nm, > me, the t-test with nm, degrees of freedom is uniformly more powerful 





PREDICTION IN REGRESSION ANALYSIS 403 


than the corresponding test with n, degrees of freedom. For each finite value 
of T, and for fixed n, 6 and |A|/| Ao|, (8 | 7) is thus strictly monotonically 
increasing with k. For fixed n, 6 and | A |/| Ay |, 8(6) is, therefore, strictly mon- 
otonically increasing with k. This means that, if s(6) were used as a measure 
of precision, the inclusion of more variables in a regression analysis which is 
not accompanied by a decrease in the residual variance | A |/| Ag | would result 
in a uniform (in 6) deterioration of precision of prediction. This is a property 
which a measure based on s(6) would share with the measure E(/). 

To defend the use of E(l), work by 8. 8S. Wilks [10] should be mentioned here, 
which uses ‘average shortness’’ as a large sample optimum property of systems 
of confidence intervals. Shortness is here physical shortness, and not Neyman’s 
shortness; average shortness is thus completely analogous to E(1). 

Neyman ((9], p. 370) writes a few sentences about physical shortness as an 
optimum property and remarks in conclusion that, “The above statement may 
appeal to intuition, but it is obviously too vague to be used in practice.” It 
would appear that he does not condemn the idea as a whole, but only stresses 
practical difficulties. 

One must, however, also mention two drawbacks of physical shortness: it is 
not invariant under monotone transformations; and it covers only precision, but 
not accuracy, a physically short interval may be relatively bad if it contains 
values which are very different from the true value of the estimated parameter. 


4. The distribution of the length of the confidence intervals. From (4) one 
may see that the length of the confidence interval, corresponding to the con- 
fidence coefficient 1 — a, is given by 


5 (a—e-1) | LL} (Lh +n+T) 
(16) l= 2, (Hota+? : 
It is convenient to obtain the distribution of 


n|Ao| |L| 1+n+T 1 
(17) eet Tallent” 48a 
at first, where u is given by (2) and v by (12). In Section 2 it was mentioned 
that u has, conditionally, given Ly, a x’-distribution with n — k — 1 degrees 
of freedom. It follows immediately, for the unconditional case, that u is distrib- 
uted independently of T according to the same distribution. The simultaneous 
density function of u and » is, therefore, 


f(u, v) 
yee exp {—u/2}or "(1 is gpm. 0 < u, 0 s v < 1. 
Substituting z = u/2v, u/2 = t one has 


(18) 


(19) f(z) «om [ exp {—t}* "(1 — t/z)*"* a. 





404 H. LINHART 


Using the integral representation of the confluent hypergeometric function ({7], 
p. 87) it is easily deduced that 


B(n — k — 1/2,k/2) exp {—z} 





f(z) = 


(20) B[(n — k)/2, k/2\T((m — k — 1)/2) 
2” * 2 F(k/2;n — k/2 — 432), 
The distribution of 1 may be obtained by substituting 


_ ohne [ (+n) A _| 
(21) l = 2°t. a[ tte al , 


One may see that this density is not of a very convenient form; tables of its 
distribution function do not exist. Using the median of this distribution would 
thus involve heavy numerical calculations. One may, for a comparison, note 
the comparatively simple form of the mean 


in agen  Pla/2) (1+n)|A|_ 7} 
ay) ee Lae Te 


5. Acknowledgment. I am indebted to the referee and to Prof. W. Kruskal 
for many critical remarks and helpful suggestions which, in particular, led to a 
much shorter proof of the distribution of 1. 


REFERENCES 


{1] T. W. Anperson, An Introduction to Multivariate Statistical Analysis, New York, John 
Wiley and Sons, 1958. 

(2) Haratp Cram&r, Mathematical Methods of Statistics, Princeton, Princeton University 
Press, 1946. 

[3] Epwin L. Crow, “‘Generality of confidence intervals for a regression function,’’ J. 
Amer. Stat. Assn., Vol. 50 (1955), pp. 850-853. 

[4] P. L. Hsu, “‘Notes on Hotelling’s generalized T,’’ Ann. Math. Stat., Vol. 9 (1938), 
pp. 231-243. 

[5] H. Linnart, “‘Critére de sélection pour le choix des variables dans |’analyse de régres- 
sion,”’ Revue suisse d'Economie politique et de Statistique, Vol. 94 (1958), pp. 202- 
232. 

(6) H. Linuart, ‘A criterion for selecting variables in a regression analysis,’’ Psychomet- 
rika. Vol. 25 (1960), pp. 45-58. 

(7) WitHeLmM MaGNus AND Fritz OBERHETTINGER, Formulas and theorems for the functions 
of mathematical physics,’’ Chelsea, New York, 1954. 

[8] ALEXANDER McFaritane Moon, Introduction to the Theory of Statistics, New York, 
McGraw-Hill Book Co., 1950. 

[9] J. Nerman, “‘Outline of a theory of statistical estimation based on the classical theory 
of probability,’ Phil. Trans. Roy. Soc. London, Ser. A, Vol. 236 (1937), pp. 333- 
380. 

{10} 8S. S. Witxs, ‘‘Shortest average confidence intervals from large samples,’’ Ann. Math. 
Stat., Vol. 9 (1938), pp. 166-175. 





A GENERALIZED PITMAN EFFICIENCY FOR NONPARAMETRIC 
TESTS 


By H. Wrrrtine' 
University of Freiburg i. Br.* and University of California, Berkeley 


0. Summary. Asymptotic expressions up to terms of order n™ are given for 
the efficiency of the Wilcoxon two-sample test relative to the standard-normal 
test and t-test for nearby alternatives. The first term is the well-known Pitman 
efficiency ; the remaining terms are corrections for finite sample sizes. Efficiency 
values are given for finite sample sizes in the case of normal and rectangular 
distributions, and comparisons of the asymptotic with the exact efficiency values 
are made. In general, the Wilcoxon test is shown to be nearly as good locally 
for moderate sample sizes as it is known to be asymptotically. A similar analysis 
is performed for the single-sample sign test. 


1. The concept of efficiency. Let X, , --- ,X,, and Y,, --- , Y, be independent 
and identically distributed according to the continuous distributions F(z) and 
G(x) = F(x — 6), respectively. Let px,.(@) and 8,,.(@) denote the power of 
two tests for the hypothesis @ = 0 against the alternative 6 > 0 at the same level 
of significance a. Then the efficiency of the first test relative to the second (for 
given values of 0, a, m and n), is 


(1.1) e(6, a,m,n) = n*/n 
where n* (not necessarily integer-valued) is defined by 
(1.2) Pmn(9) = Bae ne(@), m/n = m*/n*. 


Assume that the first derivatives of the power functions are continuous at 
6 = 0, with values Pan = Pmn(0) and Ban = Bmn(0), respectively. Then con- 
dition (1.2) reduces in the vicinity of the hypothesis to 


(1.3) Pan = Bene » m/n - m*/n*, 


which can often be easily expressed in the form of an asymptotic series, as for 
the sign test, which is done in Section 6. 

In Sections 2, 3, and 4 we shall derive an approximation to the efficiency in 
terms of (1.3) for the one-sided Wilcoxon test, using the Edgeworth expansion 
up to terms of order O(n’). This expansion gives a good approximation to the 
null-distribution of the Wilcoxon test, [4] and [7], and its applicability in our 
problem seems also to be largely confirmed by comparison with the exact values, 


Received March 23, 1959; revised January 27, 1960. 

! This investigation was supported in part by a research grant (No. G-3666) from the 
National Institutes of Health, Public Health Service. 

? Institut fir Angewandte Mathematik und Institut fir Angewandte Mathematik und 
Mechanik der DVL. 


405 





406 H. WITTING 


where such a comparison is possible. Whether it presents a valid asymptotic ex- 
pression is an open question, which appears to be of secondary importance with 
regard to getting approximate values for the efficiency, but in any case we shall 
denote the error of the Edgeworth expansion by the O-symbol of the first ne- 
glected term to indicate in a simple manner what terms of this expansion are 
taken into account. In Section 5 we shall give the results of a corresponding 
analysis for the two-sided Wilcoxon test. 


2. Edgeworth approximation of the Mann-Whitney statistic. The Mann- 
Whitney statistic U for the Wilcoxon test is 


(2.1) U=>d¢(X:,¥;), o(X,¥)=10) aX>Y¥Y(XsY¥) 


inl j=l 
The first four moments of U under the null hypothesis 6 = 0 are [4] 
(EU) = & = mn/2, (Var U)’ = ws = mn(m + n + 1)/12, 


us = 0, 
(2.2) 


us = [mn(m + n + 1)/240)[5(m’n + mn*) — 2(m’* + n’) + 3mn 
— 2(m + n)]. 


General expressions for the first four moments are given by R. M. Sundrum {6}, 
from which the following formulae for the derivatives at 6 = 0 can be obtained 


— = —mnA 
mn(m — n)(2B — A) 
4Am'n’® + (—3A + 3B — 3C)(—4m'n? + m'n + mn’ + m’n + mn’) 
(2.3) ji. = (—44 + B)(m'n® — mn‘) + (2B — 6C + 4D)(m'‘n — mn‘) 
+ (54/2 — 27B + 66C — 44D)(m'n® — m’n*) 


+ (—A + 12B — 30C + 20D)(m'n — mn’), 
with the abbreviations 


(2.4) A= [fac, B= [Ff az, c= [Ff ar, D = [Ff ae. 


In the particular case that the underlying distribution is symmetric, F(z) + 
F(—z) = 1, we have 


(2.5) 2B = A, 4D = -A + 6C, 
so that (2.3) simplifies to 
z _ —mnA, Hie - 0, is = 0, 


(2.6) 
fis = 4Am’n® + (A — 3C)(—4m'n?® + m'n + mn* + m’n + mn’). 





PITMAN EFFICIENCY 407 


Lehmann [5], among others, proved that U is asymptotically normally dis- 
tributed. As mentioned above, the Edgeworth expansion with continuity correc- 
tion up to terms O(n™) can be applied as an asymptotic expression, i.e., the 
power of this test can be approximated [2] by 
(2.7 p(U S u|m,n) 

P ) (2 (3) - 

= o(z) + exp (x) + ene (x) + ep (x) + 86(2) +0(n™) 
for a fixed value of z, where z is the normalized value of u with continuity 
correction, 


(2.8) z= (u+ 4 — EU)/(m)"”. 
The Edgeworth coefficients, 


d= i( 
*” 4ati\Gg 


6 = & = 0, 


—1/2 - 
are of order n“”, n 


order O(n™*”). 

In our problem, however, z is not a fixed constant, since the significance 
probability a is considered to be given, and the location parameter @ tends to 
zero. More precisely, on the one hand, u and a@ are connected by (2.7), for 
6 = 0, by 
(2.10) P(U < ulm, n) = (20) + ex” (x) + O(n™) = a, 


where 2 is the normalized value of u for 6 = 0, which can be determined from 
a by solving (2.10) asymptotically using e; = O(n"): 


= ut > ve mn/2 
(2.11) * (mn(m + n + 1)/12)!# 


= @'(a) — 6 [(3@ (a) — {@ “(a)}] + O(n”). 


On the other hand, z and zx» are connected by (2.8), which can be written in 
the form 


; : , mnA Lo fin 
(2.12) r=%+n0+06), m=2'(0) = Gini ~ 5 


where z, = O(n”). 


Using the fact that both the normalized variable z and the Edgeworth co- 
efficients depend on 6, and by differentiating (2.7) with respect to 6 at 6 = 0, 
we get’ 


3 Here the remainder is still O(n-**), since ég(z) vanishes for @ = 0. 





408 H. WITTING 


(2.13) Pp = a(x) + este (x0) + ep’ (x0) + ee (x0) + O(n), 
or, by means of (2.2), (2.11) and (2.12), 


p me o(@ (a) ) (= 12mn =) 4- 2 @ Ma ) 
+n+i1) ~ “= Qu: 


7 12mn 7 a 2 
- (2 -—3 es Ae) (i — 2 (a)} ) 


+ Fi eS{@ "(a)" + &(38 (a) — ["(a)} ‘| + O(n”). 
In particular, for symmetric distributions ~2 = @é = 0 and (2.14) simplifies to 
1 1/2 : ; 2 
(2.15) p=¢(@ (a »(5- SS i) A [1 + Kanac(1—{®"(a)}*) + O(n™)| 
with 


[mn + 2(1 — 3(C/A))(—4mn + m’ + ni + m+n) 
I — 0.15(m* + n* + mn + m+ n)] 


~[mn(m + n-+1)] 
3. Efficiency relative to the standard normal test. Let X; and Y; be normally 
distributed with known variance o*. Then (Y — X)/c) (mn/m + n)"” is 


N((6/c)(mn/m + n)"?, 1) 


and the power of the standard normal test, written Z-test for short, is 


mn \'? 1 
ao) = #((o/e) (- =.) — (1 -e)) 


=~ a+ (0/e) (! a ) * 9(@"(a)) + 0(6). 
+n 


Therefore, (1.3) can easily be solved for n*/n in terms of m and n, and we get 
for the efficiency relative to the standard normal test 


(3.1) 


Casa = ((1/m) + (1/n)) 0° (p/e(@ '(a))*? =120° ([rt2)ac) 
(3.2) 


{i — (1/(m + n+ 1)) + 2Kanacll —{@ (a)? +000) |, 


where A = 1/(2(m)'*) = 0.282095, 1 — 3C/A = 0.087733. 

For small m and n, the exact values of # = p’(0) can be derived from the 
integrals by which the power, p(@), is represented. (Dixon [3] has used these 
integrals for computing the power numerically for some values of 6 > 0.) So 
a comparison is possible. The accuracy becomes worse, of course, for decreasing 
values of m and n. Some comparisons are shown in Table 3, Section 4. 





PITMAN EFFICIENCY 409 


TABLE 1 
Comparison of the efficiency values e,,z and ees; for underlying rectangular distribution 


m=n= 20 } m= 20,8 = 10 m=n-= 10 


j 
| 


a 0.0298 


0.0245 0.0315 
0.8780 0.8210 
€as.4 0.8838 0.8308 
relative error .22% 0.66% 1.19% 


TABLE 2 
Efficiency values egs.4 for @ > 0 compared with the corresponding limit values for @ — 0 for 


underlying rectangular distribution 


m=n-= 10 


a | 0.0298 | r a 0.0315 
e(0) 0.9071 ® e(0) 0.8210 
e(0.01) 0.9031 ; § e(0.05) 0.8139 
e(0.02) 0.9022 e(0.10) 0.8053 


When the X, and Y; are distributed according to any other distribution with 
finite fourth moment, w,, and known o’, then ((Y¥ — X)/o)(mn/m + n)"” is 
normally distributed up to terms O(n™*), and (3.2) can also be applied. For the 
rectangular distribution R(0, 1), in particular, we have A = 1,1 — 3C/A = 0. 
In this case the exact values of # can be shown to be 


p= —(m+n)po(U S u| m,n) + mp(U S u|m — 1,n) 
(3.4) + np(U Ss u|m,n— 1) = n[p(U Ss ul m,n — 1) 
— p(U Su—m\|m,n — 1)). 


This comes from a private communication of Professor J. L. Hodges, Jr., which 
is based on the fact that only those X,; and Y, which fall in the interval 6s z S 1 
contribute anything to (2.1), and they have there the same conditional dis- 
tribution, 


(3.5) pl(U sul|m,n) = a 2, b(m, 6, k)b(n, 6, 1) po U sul|m—k,n-—1l). 
Expressions (3.4) and (3.5) can be evaluated numerically for m, n S 20 by 
means of the tables of D. Auble [1] (Tables 1-4), on which the comparisons 
and statements are based. 

The concept of efficiency for nearby alternatives is valid only in the special 
case of small values of @, but here the choice of the appropriate test is of special 
importance. The zero-order approximation of (5.2) is the well-known Pitman 
efficiency, which is 3/x = 0.955 in the case of normal alternatives and 1 for 
rectangular alternatives. The first-order approximation now indicates how the 





410 H. WITTING 


efficiency approximately changes with the significance probability a and the 
sample sizes m and n. Besides depending on the Pitman parameter o’(f f’(z) dz)’, 
it depends on the special underlying distribution only through the parameter 
C = f F'(x)f'(z) dz. 


4. Efficiency relative to the t-test. A more realistic comparison than that made 
in the preceding section is one made with the t-test, which is appropriate for 
unknown, but common variance o’. Let us restrict ourselves to underlying 
normal distributions. Expanding the density t;(2) of the noncentral t-distribu- 
tion with respect to the noncentrality parameter 6 = (6/¢)(mn/m + n)“”, one 
can verify that the derivative of the power function, 6(@) = f¢ ts(x) dz, at 
6 = 0 is given by 


1 mn 1/2 1 ( C ae 2)/2 
pel) all tagsaa 


1f mn \” 5 ( - -2 ) 

“; (nr) ACN) + pay tO): 

Here C can be determined asymptotically from a by expanding the central 
t-distribution tail integral with f = (m + n — 2) degrees of freedom for large 


values of f, 
- e m+n—4 “] 
a= [ to(x) dx = 1 [ec (2t+-—4) 


(4.1) 


(4.2) 


l (3) ' mto—*)"] 2 
4(m +n — 6)* [ (™ to-4) +On"), 


and solving for C. One obtains 


: m+n— 2\"" 1 4( ) (307 
(43) C= es (®@ (1 — a) + (1/4(m + n — 6))(38 (a) 

— {@"(a)}*) + O(n™)). 
Therefore, asymptotically, 


an 8? 1 (ms) o(@%a)){t — (1/A(m + n — 2))19°%a)} 


+ O(n™)). 
Solving (1.3) asymptotically we get 
y oe -2 
(4.5) tna ™ ewi(1 + 24(m +n — 2)0*( ff*(z) dx)* + O(n~)). 


For comparison of the asymptotic and the exact values, we give the following 
efficiency values; the values ¢gx., are obtained from (4.1) according to (1.3) 





PITMAN EFFICIENCY 


TABLE 3 


Comparison of the efficiency values relative to the 2-test and the t-test for underlying normal 
distribution 
manwé | m= nes | m=6n=4 


0.0571 0.0286 | 0.0278 0.0159 | 0.0190 
0.7340 0.7222 | 0.7607 | 0.7304 | 0.7408 
0.9772 0.9825 0.9774 0.9775 | 0.9749 
0.7817 | 0.7311 | 0.7713 | 0.7369 j | 0.7455 
0.9518 | 0.9620 0.9563 | 0.9593 / 0.9553 


TABLE 4 


Efficiency values ea... for different sample sizes m and n and reasonable values of a for 
underlying normal distribution 


adcod pohseeSeo asceGoheooce 
= - a . ——| = = = } 
0.100 | 0.9404 0.9466 | 0.9505 0.9437 | 0.9466 0.9488 
0.050 | 0.9469 0.9498 | 0.9521 0.9471 | 0.9468 | 0.9505 
0.010 | 0.9578 0.9566 | 0.9558 0.9531 | 0.9461 0.9542 
0.9556 


0.005 0.9602 = 0.9591 | 0.9573 0.9546 | 0.9452 


by linear interpolation, after having taken C from Table 3, a table of the central 
t-distribution. 

The following table of efficiency values, Table 4, indicates that the Wilcoxon 
test compared with the t-test is nearly as good locally for moderate sample sizes 
as it is known to be asymptotically. 


5. Efficiency for the two-tail Wilcoxon test. Up to now we considered only the 
case of testing the hypothesis 6 = 0 against the one-sided alternatives 6 > 0 
by one-tail tests. The analysis was based on the comparison of the first deriva- 
tives of the power functions at @ = 0. 

A similar analysis can be done in the case of two-tail tests for testing the 
hypothesis 6 = 0 against the two-sided alternatives @ ~ 0. In the particular 
case that the first derivatives, p...(@), Ba..(8), both vanish at @ = 0, condition 
(1.2) reduces to 


(5.1) fae = Rune . m/n = m*/n*, 


where pan = Du.n(O) and Ba.» = Bun.(0) are the second derivatives of the 
corresponding power functions at the hypothesis. 

An analysis similar to that discussed in the preceding sections gives the fol- 
lowing asymptotic expressions for the efficiency of the two-sided Wilcoxon test 
compared with the corresponding standard-normal and t-tests, respectively : 





H. WITTING 


m+n+1 — mn(m + n+ 1) 
(5.2) (3 -: 2 ("(a/2)"*) + Im+n+1A_ m+n+1 


Cas = 120°A’ [! = 1 m +n +mn+m+n 


2 mn A? mn 


+ O(n ‘| 


pianist -2 
(53) Cast = Caszéz (1 + 24(m - > oe cos 2) )e Af (zx) dx) ot O(n ) ° 
In contrast to the known fact that the Pitman efficiency for the two-tail test 
is the same as for the one-tail test, the correction terms of order O(n™') differ 
from those of (3.2) and (4.5), respectively. On the other hand they depend on 
the special underlying distribution only through the Pitman parameter 120°A’, 
and 


(5.4) A= [p (x) dr. 


6. Efficiency of the sign test. Now, let X;,---, X, be independent and 
identically distributed according to the distribution F(z — 0), with F(0) = 43 
and with a density f(z) continuous at z = 0. For testing the hypothesis 6 = 0 
against the alternative 6 > 0 we consider the sign test with the power function 


(6.1) p,(0) = >. ("\pra 7 


(6.2) p=4+ Of(0) + O(), q=1—p=}3— o(0) + O(F). 


Here the Edgeworth series, integrated according to the Euler-Maclaurin sum- 
mation formula, can be checked by a direct expansion of (6.1). For fixed z, 


a — a 
p,(0) = 1 (x) + 5 “ee aia ? (x 


sd (2) — 5 1 — API (2) + 89(z) + O(n"), 
~ npq 72) «npg 


(6.3) 


where x is the normalized value of k with continuity correction, and é@¢(z) 
symbolizes the terms of order n™*’, which vanish for @ = 0. For small values 
of 6, (6.3) simplifies to 


pPr(0) = 1 — g(x) — se © (xr) + x ¢'(z) 


(6.4) 
+ | y(z) + O(n) + 016), 
12n 





PITMAN EFFICIENCY 
with 


ae. aan / 
“= mo =%+ 270+ 008), xm = —2n'f(0). 


(65) 2z 


% and a are connected by p,(0) = a, which can be solved asymptotically for 
Zo as follows: 


(6.6) 29 =¢@ '(1—a) — (1/12n)6'(a) + (1/12n){o"(a)*} + O(n”). 
On the other hand, (6.4) gives, for fixed a, 


Pp, = —X(%) — 5 ¢” (x0) + a, 0 (0) 
(6.7) 


+ a ¢”’ (x) + O(n”), 
m 
or, by (6.5) and (6.6), 
(6.8) Pn = o(b ‘(a))2n"f(0)(1 + (1/4n) — (1/12n)i¢"(a)}* + O(n™)). 


Let us first compare the sign test with the #-test, the power function of which 
is 


#\1/2 #\1/2 
B.-(0) = ¢ (ere -¢~ «)) = a+ (¢%a)) ™)? 


(6.9) 
+ 0(6)’). 


Expression (6.9) holds exactly for normally distributed random variables and 
up to (relative) errors of the order O(n”) for any other distribution with finite 
fourth moment. Therefore, in the vicinity of the hypothesis, the efficiency of 
the sign test compared with the Z-test is given by 


eh Pi ee os 
ee ge Tee (sass) Af Oe 


(6.10) ' . 

: E + (1/2n) — 1 SOE + orn »|, 

6n 
the first term of which is the well-known Pitman value for the efficiency. For 
the sign test with a value of a ~ @ '(— 3"), which is a reasonable choice, 
the first order correction is small and the approximation of the Pitman value 
is an especially good one. 
Corresponding to (4.5) a comparison with the t-test gives 
\@ “(a)}* 


Cast = Cas.z ¢ + 8nf*(O)o? + O(n ) 


= 4f°(0)o°(1 + (1/2n) + {@ “(a)}*[(1/8nf"(0)o") 
— (1/6n)} + O(n™)). 


(6.11) 





414 H. WITTING 


7. Acknowledgment. The author wishes to express his sincere thanks to 
Professors J. L. Hodges, Jr., L. LeCam, and E. L. Lehmann for helpful dis- 
cussions and interest in this work. 

REFERENCES 

[1] Donovan AuBLe, “Extended tables for the Mann-Whitney-Statistic,’’ Bull. Inst. Educ. 
Research, Indiana University, Vol. 1, (1953), 39 pp. 

(2) Haratp Cramér, Mathematical Methods of Statistics, Princeton University Press, 
Princeton, 1946. 

[3] W. J. Dixon, “Power under normality of several nonparametric tests,’’ Ann. Math. 
Stat., Vol. 25 (1954), pp. 610-614. 

[4] Eveiyn Fix anv J. L. Hopaes, Jr., ‘Significance probabilities of the Wilcoxon test,”’ 
Ann. Math. Stat., Vol. 26 (1955), pp. 301-312. 

[5] E. L. Leumann, “Consistency and unbiasedness of nonparametric tests,’’ Ann. Math. 
Stat., Vol. 22 (1951), pp. 165-179. 

(6) R. M. Sunprvum, “A further approximation to the distribution of Wileoxon’s statistic 
in the general case,’’ J. Roy. Stat. Soc., Ser. B, Vol. 16 (1954), pp. 255-268. 


{7} H. Ury anv V. J. Cuacko, “Significance probabilities for the one-sample Wilcoxon 
test,’’ submitted to Ann. Math. Stat. 





CONTRIBUTIONS TO THE THEORY OF RANK ORDER 
STATISTICS: THE TWO-SAMPLE CENSORED CASE! 


By U. V. R. Rao, I. R. Savace,? anp M. Sope.t 
Indiana University, University of Minnesota, Bell Telephone Laboratories 

0. Summary. Rank order theory is developed for the two-sample problem in 
which censoring of the observations has occurred, i.e., not all of the random 
variables are observed. The approach is similar to [2] with the striking difference 
that in the present case the rank orders are not all equally likely under the null 
hypothesis, and thus it becomes important to work with the likelihood ratios 
of rank orders. In applying the results of this paper, there will be a strong analogy 
to sequential analysis. The censoring scheme corresponds to the stopping rule 
and in both cases the terminal decision should be based on the likelihood ratio. 


We do not give the detailed applications of the present theory either to earlier 
procedures or to the new ones introduced here. 


1. Introduction. Consider two ordered sets of numbers (z,, --- , 2) to be 
called the first sample, and (yj, --- , ya) to be called the second sample, i.e., 
xn<2;(1 Si<jsm) andy; < y; (1 Si <j Sn) and define A(a,b) = 
0(1) ifa > b(a S b). If A(z; , y;) is known for all values of i and j it is possible 
to combine the two sets and arrange them from smallest to largest. In that case, 
if the two sets of numbers correspond to observations from two random samples 
(with no ties), it is possible to apply the usual nonparametric procedures, e.g., 
Wilcoxon, Kolmogoroff-Smirnoff. In some statistical applications it may be 
necessary, and can be desirable, not to observe all of the order relationships 
between the two samples. Thus, if the measuring device is such that it is very 
inaccurate for small values one may learn only how many random variables 
occurred in each sample below some threshold and the order relationships be- 
tween the observed random variables above the threshold. This same difficulty 
could also occur for large values or for both large and small values. In life 
testing, savings in experimental costs and time are often effected by stopping the 
experiment before all of the lives are completed. In that case one has the order 
relationships between the smaller random variables, between the smaller ones 
and larger ones, but not between the larger ones. 

The examples above will not all be amenable to the present treatment. The 
censoring schemes that we will handle are those that depend on rules telling 
which order statistics of the combined sample to observe and not which values 


Received August 28, 1959. i 

' Work begun at the 1958 Summer Statistical Institute sponsored by the National Science 
Foundation. 

2? Work conducted in part under contract Nonr 2582(00), Task NR 042-200 at the Uni- 
versity of Minnesota. Reproduction in whole or in part is permitted for any purpose of the 
United States Government. 


415 





416 U. V. R. RAO, I. R. SAVAGE AND M. SOBEL 


of the random variable. In the case of the instrument incapable of measuring 
small values the censoring scheme depends on the values of the random variables 
and therefore is not based on order statistics. That example could, however, be 
modified in the following manner. Wait until the first p per cent of the articles 
are passed (screened so that it is known that they are smaller than the remaining 
ones) and then measure the remainder. If p is chosen carefully there will (with 
high probability) be little difficulty in making the measurements. Actually, 
measurement would not be necessary since only the order relationships between 
the larger observations are required. In life testing where one waits for a fixed 
number of failures the censoring scheme is already the desired form. 

To describe a censoring scheme more precisely we introduce the following 
notation. Let z; = O(1) if the i-th smallest in the combined sample of z’s and 
y’s is from the first (second) sample. Then z = (2 , --- , Zm4n) is a (uncensored ) 
rank order. If 0 corresponds to a unit move to the right and 1 to a unit move 
up then each rank order can be represented by a path of horizontal and vertical 
unit movements on the integer lattice from (0, 0) to (m, n). Censoring schemes 
amenable to the present treatment can be described in terms of this lattice, 
e.g., the censoring scheme which continues experimentation until one of the 
lattice points whose coordinates add to N* is reached is the censoring scheme 
which tells one to continue until the N* smallest random variables of the com- 
bined sample have been observed. 

In this paper we consider explicitly the following type of censoring scheme: 
Let S be a set of lattice points such that every path from (0, 0) to (m, n) has 
at least one point incommon with S, and S does not include (0,0). Start experi- 
mentation by observing the smaller random variables in the combined sample 
first and continue experimentation until a point in S is reached. 

Thus, for censoring schemes explicitly considered we observe the “‘smaller”’ 
random variables only. The precise meaning of “‘smaller”’ depends on the particu- 
lar censoring scheme. Therefore, the observed rank order is of the form z = 
(21, -** , Zwe) where z; = O(1) if the 7th smallest random variable comes from 
the first (second) sample. Depending on the nature of the censoring scheme N* 
can also be the observed value of a random variable. Note that in writing the 
vector z it is not necessary to know the sample sizes m and n. When computing 
Pr (Z = z), however, the values of m and n will always be required. The sample 
sizes will appear explicitly in the various formulas and be implicit in the discus- 
sion. 

The following notation and assumptions are used: The random variables, 
Xi, +-+,Xm, Yi, °-+, Yn, are mutually independent. The X’s (Y’s) have a 
common continuous cumulative distribution function F(x)(G(z)|. The corre- 
sponding density (assumed to exist) will be denoted by f(x)[g(2)]. 


2. Censoring Schemes. If all m + n = N of the random variables are observed 


there are (") possible rank orders. It is of some interest to find the total number 





RANK ORDER STATISTICS 417 


of rank orders in censoring schemes of the kind to be considered, i.e., the ‘‘smal- 
ler’? random variables are observed. A rank order, z, is said to be “redundant” 
if there exists an antecedent rank order z’ such that the occurrence of z’ implies 
the occurrence of z. Thus if m = 2 and n = 2, then z = (0101) is redundant 
since z will occur whenever z’ = (010) occurs. With any scheme where only 


order is to be used, sampling can be stopped if only redundant rank orders 
remain to be observed. 
Lemma 2.1. 


(1) The number of non-redundant rank orders for all censoring schemes involving 
the observation of the smaller random variables is 2(*) — 2. 


(2) Including redundant rank orders there are es ‘) — 2 possible rank 


orders. 

Proor. The number of non-redundant rank orders consists of two parts. 

(a) Those rank orders consisting of a S m — 1(b S n — 1) observations 
from the F(x)|G(z)| population. The number of these is 


ZECT")-()-* 


(In the summation exclude a = b = 0.) 

(b) Those rank orders where the number of observations from F(xz)(G(z)] is 
m(n). When the number of observations from F(z) is m the rank order must 
end in 0 and have less than n from G(z). Hence the total number of rank orders 
of this form is 


F(x te Benes 


i) m— 1 a= n—1 


miler aga ne oy, 
és m ) + n * n ) 
The conclusion then follows by adding the results from parts (a) and (b). Part 
(2) of the lemma is proved in a similar manner. 

As soon as one considers rank orders with differing values of N* it is important 
to note that the rank orders need not correspond to disjoint events. Hence, the 
sum of the probabilities of all rank orders will be greater than one. A redundant 
rank order and its antecedent correspond to the same event. If z is a non-re- 
dundant. rank order and 2(z,) are formed from z by observing an additional! 
random variable from F(z)[G(2z)] then the event z is the union of the events 2 
and z,. By repeated use of the preceding one can compute the probabilities of 
all of the rank orders if all of the probabilities of rank orders with N* = N have 
been computed. To illustrate, the possible rank orders are listed for the case 
m = 2andn = 3. The redundant rank orders are set equal to their antecedents. 
There are 33 rank orders, 18 of which are non-redundant. 





U. V. R. RAO, I. R. SAVAGE AND M. SOBEL 


00 = 001 = 0011 = 00111 ye 1001 = 10011 
10 


of 
010 = 0101 = O1011 te 1010 = 10101 


01 101 
0110 = 01101 Nou = 10110 


ae 
Neil = 01110 


011 
111 = 1110 = 11100 
4 


1100 = 11001 
110 


J 
Nnio1 


11010 


Some specific censoring schemes follow: 

(a) Continue experimentation until the N* smallest random variables are 
observed, 0 < N* s N. N* is not a random variable. 

Lemma 2.2. The number of possible rank orders under scheme (a), including 


redundant ones, is 
min(m,N*) N* 
ps aie 
i—Max(0,N*%—n) ( u ) 


so that when N* < min (n, m) the number of rank orders is 2”’. 

(b) Continue experimentation until m* random variables from F(z) have 
been observed. If n* is the number of random variables observed from G(z) 
then n* and N* = m* + n* are random variables. 

Lemma 2.3. The number of non-redundant rank orders, when experimentation is 
continued until m* random variables from F(x) or n from G(x) are observed, is 
ao 

n : 


Proor. Some of the rank orders end with an observation from F(x). The 


number of these is 
— /m® ~ 1+ m* +n-— 1 
> ( m* — 1 )=( n—1 ). 


In addition, there are those rank orders where n observations are obtained from 
G(x) before m* are obtained from F(z). The number of these is 


ee ey, 


Lemma 2.4. If observations are continued until m* of the random variables from 





RANK ORDER STATISTICS 419 


the first sample are observed, i.e., the possibility of observing redundant rank orders 
is not excluded, then 


t<m* or t>m*+n 


m Fei al sean ante? 
)(.) [Pe G71 — F)*™"(1 — G) dz, 


m* sts m* +n, 
and if F(x) = G(x) then 


Ne t<m* ort >m*+n 

N-t 

= \(n Oe ae * * 
pe ‘ m*= sts m*+n, 


( m 


and 


E(N*) = m*(N + 1)/(m +1) <N., 


Proor. In the first part of the lemma the integrand is the probability of the 
desired event when the m*-smallest random variable from the first sample 
occurs in the interval (z, x + dr). The integration then gives the total prob- 
ability. The second part of the lemma follows from the first by noting that 
when F(x) = G(z) the integral is a Beta integral. The second part could also be 


obtained by a direct combinatorial argument. 

Lemma 2.5. When F(x) = G(x), if observations are continued until either m* 
of the random variables of the first sample or the n random variables of the second 
sample are obtained, i.e., redundant rank orders are not observed, then 
Pr (N* = t) 


(0, 
\(t—-1 N-t 
= \(me='1) (mw — me) * ( 
Seatid oP | 
a) 
Proor. The proof is combinatorial. When F(z) = G(z) all of the rank orders 
with N* = N are equally likely. The denominator gives the number of rank 
orders with N* = N. The first term in the numerator gives the number of rank 
orders ending with the m*-smallest random variable from the first population. 
The second term in the numerator gives the number of rank orders ending with 
the nth random variable from the second population. 


(c) Continue experimentation until either the number of random variables 


from F(z) is m* or the number from G(z) is n*, where m* and n* are fixed 
integers. 


t < min (m*,n) ort > m* +n 


min (m*,n) st < m* + n. 





420 U. V. R. RAO, I. R. SAVAGE AND M. SOBEL 


(d) Continue experimentation until max—ecsczys [Fme(z) — G(x) 2 
Ome.ne, Where F,,«(x) and G,+(z) are the observed cumulative distribution func- 
tions based on the first m* random variable from F(x) and n* random variables 
from G(x), and the a,«,,+« are preassigned numbers with a,,,, = 0. (See reference 
(4].) 

(e) Continue experimentation until [m* — n*/ = buenas, where the Dae ns 
are preassigned numbers and b,.,, = 0. 

(f) Continue experimentation until the sum of the ranks squared of the ob- 
servations from G(x) exceeds Cme.n*+ and Cm, = 0. 


3. Theory. In this section we give formulas for the probabilities of rank orders 
arising under censoring. General resultsfor F(z) # G(x), special cases of 
F(z) # G(x), and F(z) = G(x) are considered. Likelihood ratios are defined and 
limiting values are computed for the probabilities of rank orders under censoring. 
Theorem 3.2 gives partial orderings of the likelihood ratios of the rank orders. 

In the following we explicitly consider those rank orders involving observa- 
tions on the “smaller” values of the random variables from the combined sample. 
When a result is given for a specific rank order or set of rank orders it is pre- 
sumed that under the censoring scheme being considered these rank orders can 
occur. If the rank orders cannot occur for a particular scheme then the rela- 
tionship would not be of interest. 

The basic formula is given by 
Tueorem 3.1, Pr (Z = z) = Pr((Z,, --- , Zwe) = (a, «++, 2we)) = [(min!), 
((m — m*)!(n — n*)!)|{W(w) dw where 


[ wow) dw = / ree / TI b7**(10,) 9" (10,) dw,] 


t=1 


—O wy wy <e . i 
[1 — P(e)!" [1 — G(wy-)]"™, 
and n* = tai Zi, m* = N* — n*, 

Proor. The integrand together with outside constants is composed of the 
product of two multinomial probabilities—the probability that one random 
variable from the first (second) sample occurs in each of the intervals 
(w;, w; + dx;) where z; = 0(1) and m — m*(n — n*) random variables from 
the first (second) sample occur in the interval (wy, ) The integration then 
gives the total probability. 

Coro.uary 3.1. When F(x) G(x) then 


(2) = Pr(Z = 2) = z - AC). 
m—- m m 


Proor. Make the change of variables F(w;) = G(w,;) = u; in the conclusion to 
Theorem 3.1. The integral then becomes 


min! s—5° TF 
PMO) © eam Lf A ee Tae 


i=] 


O<cui<-++<uy*<l 





RANK ORDER STATISTICS 


' ' 1 

min. we—l N—N* 

Se meee eel asa ob du. 
(m — m*)!(n — n*)(N* — oid Y a “) ’ 

The last is a Beta integral. 

Theorem 3.1 and Corollary 3.1 may be useful for summing finite series. We 
have >, Pr (Z = z) = 1 when the summation is over all possible rank orders 
that will terminate experimentation for a particular censoring scheme. Thus in 
the case of the corollary we have >> Ee + we = (*) with the same region of 
summation. Consider a special case: Stop experimentation on the N*th ob- 
servation, where N* < min (m,n). Now when experimentation stops there will 
have been observed i random variables from the first sample. And if i random 


* 
variables from the first sample have been observed there can be formed S ) 


rank orders. Thus the summation becomes 


< (N*\ (N -— N* N 
ts) = 
Of primary concern in finding good tests (decision procedures) is the likeli- 
hood ratio, the probability of a rank order when F(x) # G(x) divided by the 
probability of the same rank order when F(z) = G(r). Denote this ratio by 
L(z, F, G) or L(z). 
Coro.iary 3.2. 


N! 
(W — NI 

In general good rank order test procedures of the hypothesis that the samples 
come from the same population against the alternative that the first sample 
comes from F(z) and the second sample comes from G(x) will be based on large 
values of L(z, F, @), i.e., rank orders which make L(z) large form the critical 
region. 

When F(z) = H(z, 0) and G(x) = H(z, 6) one can write L(z, F, G) as 
L(z, H, 6) or L(z, 0). H(z, @) is a cumulative distribution function with param- 
eter 6, and A(z, @) is the density of H(z, @). In this case locally most powerful 
rank order procedures can be formed for small values of @. 

Assume that @ is real valued and that dL(z, 6)/d@ = L’(z, 6) exists in the 
neighborhood of 6 = 0 and denote L’(z, 0) by L’(z). Then 


L(z) = L(z, F,G) = Pr(Z = z)/Po(z) = 


/ W(w) dw. 


L(z, 0) = L(z,0) + OL’(z) + o(8), 
but L(z,0) = 1 so that 
L{z, 0) = 1 + OL’(z) + o(8). 


Thus if the alternative is that @ > 0 but near 0 the locally most powerful test 
will put those z’s into the critical region which make L’(z) largest. 





422 U. V. R. RAO, I. R. SAVAGE AND M. SOBEL 


Corouiary 3.3. If h(a, 0) = (2) *'e*”"”, i.e., the alternative hypothesis is 
that the samples come from normal distributions with the same variance, then 


L'(z) = y Ex, + N(n — n*) a ad :) 


f H™’—"(x, 0)h?(x, 0)[1 — H(z, 0))"-*"" az, 


where Ey, is the expected value of the ith smallest in a sample of N from a normal 
distribution with the mean zero and variance one. 

Proor. L'(z, 0) = (N!/(N — N*)!) [W(w) dw{(n — n*)g(wye)/[1 — G(wye)] 
+ DO 2i(w; — 6)). 
Hence 


N! Nn* os 4 
Ls) = VW) | a | {1 hw, 0) de] yale 


i=l 
—~WK WIS Sen ce 


N! 
( — * 7 ome we, + an siaeeeiigiameees 
4 n n )h(wy- ,0)/ /\1 H(w 0)| + Ys, om (N os N*)! ’ 
(E,W an) 


—_ 1 be speed 
*“G— uN — a)! - wH" (w, 0)h(w, 0)(1 — H(w, 0) 


in 


+ f. H™**(w, 0)? (w, 0)[(1 — H(w, 0))°”" 3g 


(N* — 


The portion of the statistic depending on the Ey; is the same as one proposed 
by Fisher and Yates for the uncensored case. The integral in the second part of 
the statistic has not been tabulated. When N* = N the statistic in the corollary 
becomes the Fisher-Yates statistic. 

Corotiary 3.4. If H(x, 0) = 1 — [1 — J(x)]'**, where @ > —1 and J(z) 
is a distribution function having density j(x), i.e., the Lehman alternative, then 


rT? i—1 
I. L(2,) = ~~ Se ton /[T (4 +640 Z em)], 
j= 


where A = m — m* + (1 + 6)(n — n*), and 


II. d(In L(z, 6))/d8 ono = n* — (n — n*) > (A + j)7 


- = | > (A + i) ‘|. 


y= i=j+1 


N! ) 
] } 
Way; (1 + 8)" [--] {I swat ~ JC w,)}'* dw,) 
ww << wy ce 


[1 — J(ane)l” “1) — Siw," 
This can be integrated exactly by starting with wye . 





RANK ORDER STATISTICS 423 


This is similar to Corollary 7.a.1 and Equation 7.c.2 of (2). 

Corotiary 3.5. If H(z, @) = (1 — 0)J (x) + 0J*(x), where O S 6 S 1 and 
J (2) is a distribution function with density j(x), then (N + 1)L'(z) = 230%, iz, 
—n*(N + 1) + (n — n*®)N*. 

PROOF. 


scl aa 
L'(2,0) = oy yay, | Ww) dw 


+ 


f[S 2d2J(w,) — 1) (n — n*)[J (wwe) — Te 
\S 1 — 6 + Iw) © 1 — A — OS (wy) — OI *(wys) 


and 


La) = ey oe -[ bei av, | - wnt” 


<< cen <i 


- [> z;(2w0; — 1) + (n = n* oe |. 


The necessary integrals are of the Beta form. 

When N* = N this reduces to a result of Lehmann [1]. Statistics of this form 
have been introduced earlier by Sobel [3]. 

Now assume that f(z) and g(x) have a monotone likelihood ratio, i.e., ifz < y 
then 


| | 
f(z) g(x) | 
fy) gy)|=™ 


with strict inequality for a set of positive Lebesgue measure in the (z, y) space 
Two other forms of the same condition are, for z < y, 


S(x)gly) = Sly)g(2), 
S(z)/g(z) 2 Sly)/gly). 


The monotone likelihood alternatives include many of the common situations, 
e.g., f(z) a normal density with mean zero and variance one and g(z) a normal 
density with positive mean and variance one, or f(z) = ¢* for x > 0 and zero 
otherwise and g(x) = (1 + 6)"e*°*® for z > 0 and zero otherwise, where 
é> 0. 

THEorem 3.2. Assume X,, --- , Xm, Yi, +++, Ya are mutually independent 
random variables. The X’s have the density f(x) and the Y’s have the density g(z), 
where f(x)g(y) = fly)g(2) for x < y with strict inequality on a set of positive 
Lebesgue measure in the (x, y) space. 

a. If z and 2’ are identical except that z; = 2; = 0 and z; = 2; = I 
1s i<j s N*, then Pr (z) > Pr (2’) and Liz) > Liz’). 

b. If z and 2’ are identical except that zy- = 0 and z’y-e = 1, and hence m’* = 
m* — 1 and n’* = n* + 1, then L(z) > L(z’). 





424 U. V. R. RAO, I. R. SAVAGE AND M. SOBEL 


c. If z and 2’ are identical except that N’* = N* + 1 and z’ye4,; = 0(1), then 
L(z’) > L(z) (L{z) > Li(z’)). 

PRoor. 

a. This is a simple analogue of Theorem 6.1 [2]. 

b. Let D = L(z) — L(z’). Then 


D = yas wal: [ TT 7°" (w.dt(ws) dn | 


—e<c wy ++ <wy*<e 


[1 — F( wy)!” [1 — G(wye) |” -q( wwe), 


q(wwe) = {f(wwe)[l — G(wye)] — g(wye)[L — F(wwe))}. 


To show D > 0 it is sufficient to show q(wy-) > 0. Start with f(z)g(y) — 
f(y)g(2) 2 0, if x < y. Multiply this inequality by dy and integrate from z to 
«© obtaining JP f(x)g(y) dy — JFg(x)f(y) dy > 0 for some value of x. Now 
replace x by wye and the last becomes g(wy-) > 0. 

c. Let 2” be identical to z’ except 2” we4;. Then Pr (Z = z) = Pr (Z = 2’) + 
Pr (Z = 2”), and 


L({z) = Pr(Z = z)/Po(z) = Pr(Z = 2’) Po(2’)/Po(z) 


+ Pr (Z = 2")Po(2")/Pp (2”) Pelz) = La") PZ) 4. r¢a7) Pol2”? 


Po(z) P(z) 
Now from Part b one has L(z’) > (L(2”) and thus 


L(z) < L(z’)[Po(2’) + Po(z”)]/Po(z) = L(z’) 


since Po(z’) + Po(z”) = Po(z). This completes the proof. 

When m = n = 2 we obtain diagram B with the aid of Theorem 3.2. An 
arrow leading from one rank order to the other means that the likelihood ratio 
of the first is greater than the second. Attached to the arrows are letters indicat- 
ing the portion of Theorem 3.2 used. Antecedent and redundant rank orders are 
set equal. 

Note that b of Theorem 3.2 is not needed in diagrams like diagram B since 
typically (m and n > 3): L(0100) > L(0101) follows from a double application 
of ec. viz., L(0100) > L(010) > L(0101). 

The distributions used in Corollaries 3.3, 3.4, and 3.5 have monotone likeli- 
hood ratios. Thus the locally most powerful rank order tests based on those 
corollaries yield simple orderings of the rank orders which are compatible with 
the partial orderings of Theorem 3.2. Theorem 3.2 and the resulting diagrams 
will be found useful in constructing good decision procedures when the monotone 


likelihood ratio assumption is acceptable and the sample sizes are relatively 
small [2]. 





RANK ORDER STATISTICS 


0011 


11 = = 1100 


4. Additional problems. Before applying the results of this paper several 


general as well as specific problems need discussion. Even the restricted class of 
censoring schemes discussed explicitly is large, and the class of censoring schemes 
amenable to treatment is very large. Hence, reasons for concentrating on specific 
schemes should be developed. Some possibilities are: a. Use censoring schemes 
that are now used, i.e., fix N* as is done in some life testing problems. b. Use 
some optimality criterion, such as minimizing the expected number of observa- 
tions for a fixed level of significance and power (for some alternative). c. Reason 
by analogy and work with procedures that continue sampling so long as 
a < L(Z) < b and make the appropriate decision if this condition fails (a and 
b chosen constants). The large sample distribution theory should be developed. 
(The locally most powerful rank procedures are in a sense large sample 
procedures.) Intercomparisons of the “‘efficiencies’’ of the procedures being 
discussed here should be made with other procedures—parametric and non- 
parametric. Efficiency must include power considerations and cost of experimen- 
tation. 

For each censoring scheme the distribution of the number of observations 
required should be investigated under the null and alternative hypotheses. 
At the least the first two moments should be found. Some of these distributions 
should be tabulated and the large sample theory developed. Tables of the integral 
in Theorem 3.1 are desirable. When tables exist of the uncensored rank orders 
this is an easy task (see paragraph following Lemma 2.1). The exact and large 





426 U. V. R. RAO, L R. SAVAGE AND M. SOBEL 


sample distribution of the statistic in Corollary 3.3 should be found for several 
censoring schemes. In particular, values of the integral need computation. For 
Corollaries 3.4 and 3.5 it would also be desirable to obtain the exact and large 
sample distributions. Presumably the results for large samples will not only 
show limiting normal distributions but give information regarding efficiency. 
Diagrams resulting from Theorem 3.2 should be prepared for several combina- 
tions of sample sizes. When a complete diagram is given it is then possible to 
select out the portions relevant to a particular censoring scheme. These diagrams 
should yield uniformly most powerful rank order procedures when the sample 
sizes and levels of significance are relatively small. 
REFERENCES 
[1] E. L. Leamann, ‘The power of rank tests,’’ Ann. Math. Stat., Vol. 24 (1953), pp. 23-42. 
(2] I. Ricnarp Savaae, ‘‘Contributions to the theory of rank order statistics—the two- 
sample case,’’ Ann. Math. Stat., Vol. 27 (1956), pp. 590-615. 
[3] Miuron Sose., ‘‘On a generalized Wilcoxon statistic for life testing,’ Proc. Working 
Conference on the Theory of Reliability (April 17-19, 1957), New York University 
and the Radio Corporation of America (1957), pp. 8-13 
[4] Cuta Koer Tsao, ‘‘An application of Massey’s distribution of the maximum deviation 


between two sample cumulative step functions,’’ Ann. Math. Stat., Vol. 25 (1954), 
pp. 587-592. 








ON A RESULT BY M. ROSENBLATT CONCERNING THE 
VON MISES-SMIRNOV TEST 


By M. Fisz 
University of Warsaw 


1. Summary. Rosenblatt’s derivation [3] of the limiting distribution of the 
statistic (1) below contains an incorrect step.' A simple argument is presented 
that corrects Rosenblatt’s proof, so that his conclusion is shown to be valid. 


2. Rosenblatt’s result. Let xz, (k = 1, --- ,n) and y; (j = 1, ---, m) be two 
independent random samples from two populations with the same continuous 
distribution function F(t). Let S,(t) and S8,(t) denote the corresponding em- 
pirical distribution functions. Lehmann [2] has suggested 


(1) (mn/(n + m)) [ [S,(t) — So(t)} d{(nS,(t) + mS,(t))/(n + m)] 
as a test statistic for the two sample problem. Rosenblatt [3] has proved that 
the statistic (1) has the same limiting distribution, when n-—> ©, m— @, 
m/n—» > 0, as the von Mises-Smirnov statistic (Smirnov [4]), 

nf (S(t) — F(t)} aF(t). 
An essential role in Rosenblatt’s proof is played by the equality 


(nm/(n + m)) [ [S,(t) — S_(t)] dl(nS,(t) + mS,(t))/(n + m) 1] 
(2) = (am/(n'+ m4 f [Sx(t) — dP dlS,(t) - 


where the non-restrictive assumption has been made that F(t) is the uniform 


distribution function on {0, 1]. Now simple calculations show that (2) does not 
hold. Set 


(3) A= [ [Si(t) — S(t)’ d[(nS,(t) + mS,(t))/(n + m))\, 
(4) B= [ [Si(t) — So(t)F at, 


m “e« [ [So(t) — dF dlS,(t) — + [ (S(t) — of S(t) — 4. 


Received April 17, 1959; revised February 18, 1960. 


1 This has been noted in a paper by J. Kiefer [1], which appeared after the present note 
was submitted. 


427 





428 M. FISZ 
Let us assume that S(t) and S,(t) are continuous from the right. We have 
then (with probability 1, since Pr (2, ~ zm, ¥ yj, ¥ Yin» i, be = Ie, 
n, has Je -_ l, 1e.9 ie ky # ke, hn # je) - 1) that, 
A =li/(n + m2 [(k/m)? — 2(k/n)Sy(24) + S324) 
(6) mn = 
+ p> [(j/m)* — 2(j/m)Si(y;) + sti} ; 
j= 


C = (1/n) . (S:(22) — xs]? + (1/m) bs [Si(ys) — wil 
1 » n—l Zket . 1 . 
-[ea-ZFf ((k/n) - Pa — f (a - eae 
-[ea-Zf (G/m) — Par — f (— ora 


= (1/n) > Si(2,) + (1/m) > Sty) - 4 


(7) 


+ (/nyed [2k — 1 — 2nS,(2,))z, 
k=1 


+ (1/m)* 2 (2 — 1 — 2mS,(y;)]y;. 


We find from (6) that 


A = (1/n) © Si(24) — (1/m) p> Sty) +4 
= [1/(n + ml [((k/n)® — 2(k/n)S2(a,) — (m/n)S3(2)] 
+ E1G/m)* — 2G/m)Si(y,) - (n/m) Si) +4 


(8) =[1/(n + m)] bP (k/n)? + p> (j/m)? + (1/(nm)) (> + vs) 





n+mi{<ofnSi(m) + mS.(xn)P , < [ sae + mio) | 
= Remy [ae lal 4 | aed tla jr 


=1 


=[1/(n+ m)i| (k/n)? + > (j/m)* + (1/(nm)) (= + vs) 


= {ta > fal > [r/(n + m)F + $ = 1/(6nm). 





VON MISES-SMIRNOV TEST 429 


On the other hand, we have 


B= [ Si(t) dt + | Si(t) dt — 2[ S,(t)Se(t) dt 
(9) ' 


= —(1/n)* Do [2k — 1 — 2mSs(z4)]24 — (1/m)" 2, (2) — 1 — 2mSi(y,)]y;. 
= j~ 
Relations (7)-(9) imply that A — B — C = 1/(6nm). Consequently the left 
side of (2) differs from the right one by 1/|6(n + m)}. 
Although equality (2) does not hold, the assertion of Rosenblatt’s theorem 
remains true, since 1/[6(n + m)}—~0 as n-— © and m— ~. 


REFERENCES 


[1] J. Kierer, ‘‘K-sample analogues of the Kolmogorov-Smirnov and Cramér-v. Mises 
tests,’’ Ann. Math. Stat., Vol. 30 (1959), pp. 420-447. 

(2) E. L. Leumann, “Consistency and unbiasedness of certain nonparametric tests,’’ 
Ann. Math. Stat., Vol. 22 (1951), pp. 165-179. 

[3] M. Rosensiatr, ‘‘Limit theorems associated with variants of the von Mises statistic,”’ 
Ann. Math. Stat., Vol. 23 (1952), pp. 617-623. 


[4] N. Smirnorr, “Sur la distribution de w,’’ Comptes Rendus de l’'Academie des Sciences, 
Vol. 202 (1936), pp. 449-452. 





A SEQUENTIAL DESIGN FOR THE TWO ARMED BANDIT 


By Water Voce. 
Universitat Tiibingen and University of Chicago* 


1. Introduction. Let the two random variables (r.v.) X and Y, with E(X) = p 
and E(Y) = q, describe the outcomes of two experiments, Ex I and Ex II. 
An experimenter, who does not know the values of p and q, has to perform a 
sequence of experiments, and at each step he may choose between Ex I and 
Ex II. He has to stop after n steps, and he wishes to maximise the sum of all 
outcomes. His decision between Ex I and Ex II at the kth step will depend on 
the corresponding decisions at prior steps and on the outcomes of these prior 
experiments. We call a plan, which fixes his sequence of decisions according to 
his previous knowledge, a strategy. 

Robbins [6] shows that it is easy to find a strategy so that the arithmetic 
mean of n outcomes tends (n — ©) towards max (p, g) with probability 1. 
Bradt, Johnson and Karlin [3] try to find a best strategy for fixed n rather than 
asymptotically. They assume known a priori distributions for the values of p 
and q. For other approaches see Robbins [7] Isbell [5], Beliman [2] and Vogel [8]. 

The purpose of this paper is to describe a class of strategies, which results from 
the following kind of restriction. In the first 2k steps we perform each of Ex I 
and Ex II k times. Then the rest of the n — 2k steps are made either with Ex I 
alone or with Ex II alone. The decision whether to continue with Ex I or with 
Ex II will be made with the help of a sequential probability ratio test for double 
dichotomies. Therefore k is a r.v. that will be denoted by K when appropriate. 

Strategies of this kind are not exceptionally good ones (in the sense of the loss- 
function defined in Section 3). But when a strategy is applied in practice it may 
be found economic to do only one sort of experiment for most of the steps. 
Perhaps the equipment of the other sort of experiment can be used for other 
purposes; perhaps the shift from one experiment to the other is costly. For such 
reasons it may be quite natural to use only those strategies described above. 
Another justification for treating this class of strategies are the results in [8], 
for which the Theorems 2 and 3 of this paper are needed. 

Section 2 contains some auxiliary material. Except for Theorem 1, which we 
give in a slightly more general form than needed for the rest of this paper, 
nothing here is new, but we found it convenient to summarize some definitions 
and easy-to-prove formulas in one section. 

The loss-function and an approximation to the loss-function will be derived 


Received January 19, 1959; revised January 18, 1960. 

! Research carried out in part at the Statistical Research Center, University of Chicago, 
under the sponsorship of the Statistics Branch, Office of Naval Research. Reproduction 
in whole or in part is permitted for any purpose of the United States Government. 

? Present address: Mathematisches Institut der Universitat Tibingen, Germany. 


430 





SEQUENTIAL DESIGN FOR THE TWO ARMED BANDIT 431 


in Section 3. Section 4 is devoted to a minimax theorem for the approximate 
loss-function. In Section 5 we give some results for n — «. It is assumed in 
Sections 2-5 that the r.v’s X and Y are binomially distributed. In Section 6 we 
consider a more general case. 


2. Some remarks about sequential plans. Let Z; (i = 1, 2, 3 ---) bea sequence 
of r.v.’s with £(Z,;) = m,, and let | Z;| S A < #. We make no assumptions 
about the dependence of the Z;. Let N > 0 be an integer-valued r.v. with 
E(N) < «. We assume that E(Z;|N = n) = m,; fori > n. 

THEOREM |: From the assumptions made above it follows that 


(2:4) - (Em) 


Proor: It is obviously sufficient to assume m, = 0 and to prove E( >-%. Z;) = 
0. From E(N) < @ it follows that 0 < nP(N 2 n) S Do. kP(N = k) > 0, 
(n — «). Therefore 


(2.1) |e Zi|\N 2 n) PON 2n)|s AnP(N 2 n) ~0O and 
B(X 2:1N & n)P(N & n)| 5 AB(N|N = n)P(N B 2) 


= A> kP(N = k) +0, 


From 


t=1 


o(% 2.) = (Day <n) P(N <n) + B(E 2:10 & n) PIN Bm) 


and the last relation it follows that 


(2.2) B(X 2.1N <n) PW <n) +8(E 2). 


t=1 


As E(Z;|N = k) = 0 fori > k we have 


ra SEALY <n) 
(2.3) =E(E aN <n)+e( > ZN <n) 
= E(w <n). 


Now 


E(=z,) 


E(X Zin < n) POV <n) +E(EiN = n) PUN =n). 
o=1 t=1 





432 WALTER VOGEL 


(2.1) shows that the last term converges to zero. The other term on the right 
side converges to E(>.%., Z;) because of (2.2) and (2.3). This proves the 
theorem. 

If the Z; are independent and identically distributed and if {N = n} is de- 
fined on the first n Z; the theorem reduces to the well known formula 


N 
»(% z.) = E(N)E(Z,) 
i] 


We describe now a random walk with absorbing barriers. Let X,; and Y; 
(i = 1,2,3, --+ w) be independent r.v.’s with P(X; = 1) = p = 1 — P(X; = 0), 
P(Y; = 1) =q =1— P(Y; = 0) andO < p,q < 1. Let further Z; = X; — Y, 
and U, = Das Z; . We define some events (a is a positive integer). 


Ain = {[-—a < U; < +a fori < kand U, = +a}, (k = 1,2, --- w) 
Ao, = {-a < U; < +a fori < kand U, = —aj, (k = 1,2,--- 4p) 
B, = |-a < U; < +a fori S py and U, = »} 


(y —at+l,—a+t+2,---a-—-2,a-—1) 
» on 
A; 2 Mak, 2: Ae 2, Are, Rw ie 


We have P(A;) + P(A) + P(B) = 1. 


Let p(l — gq) = 7, q@(1 — p) = 38,1 —r—s =tandu = r/s. Herer, sand 
t are the probabilities that Z; = 1, —1 or 0. 

We will prove 

Lemma |: 


P(Ay x | Ara + Are) u“/(u“° +1) = ¥ 
P(Aox | Ain. + Ao.) = 1/(u* + 1) =l]—- Y- 


Proor: P(A;x%) = a cy rst’, where c,,, is the number of admissible paths 
which the point (7, U;) describes in the plane before reaching (k, a). The paths 
have p steps up, o steps down and + horizontal steps with the conditions p + 
o + + = k and p — o = a. The summation-range for r is 0 S r Sk — az. 
By introducing k and a instead of p and o we get 


P(Aix) = (r/8)*"(r8)"? DO cust (r8)" 


and likewise 


P( Asx) = (r, 8) *?(rs)* 2 C.t (rs) , 





SEQUENTIAL DESIGN FOR THE TWO ARMED BANDIT 433 


Taking account of P(A;, + Ase) = P(Aisx) + P(As,) the lemma follows 
immediately. 
From Lemma | follows 


P(A) = 1 P(Aix + Are) = (1 — P(B))u"/(u* + 1) 
(2.4) 


P(A,) = (1— DE PCAs + Asa) = (1 — P(B))/(u* + 1). 


We now define an integral valued r.v., K, by {[K = k} = Aye + Asus for 
k <pand{K = wu} = A,w + Az, + B. An application of Theorem 1 gives 


x 
A> (X; + r.)) = (p+q)E(K) and 


i=l 
(2.5) 3 


E(Ux) =E (> (Xi — r.) = (p — qg)E(K). 


From (2.4) follows 
E(Ux) = aP(A;) — aP(A2) + 2d. vP(B,) 
—a< a 


= a((u* — 1)/(u* + 1))(1 — P(B)) + 2 vP(B,). 


As P(B) — 0 for » — «, we have E(K) = (p — q)E(Ux) — a/(p — q) 
((u* — 1)/(u* + 1)). Obviously E(K) is a monotone increasing function of 
u so that 


*— 1] 
2.6 K(k) 4 (<=). 
GA) (p — q) \ur + 1 
In Section 5 we are interested in sequences of such random walks as described 
above by a, wu, U, and K. In order to define a sequence of random walks let 
n = 2u be an even number and let X{"’ and Y{" (i = 1, 2, 3 --+ w) be inde- 
pendent r.v.’s with P(X$” = 1) = p, = 1 — P(X}” = 0) and P(Y${” — 1) = 
qn = 1 — P(Y{” = 0). We assume p, 2 gn, Pn — Gn = mn + o(n*), pp, 
Qn > p and 0 < p < 1. Let a, be an integer such that 
a, = an’ + o(n'), a> 0. 
Putting p.(1 — gn)/qn(1 — pa) = Un we get 
(2.7) ute (with » = exp m/p(1 — p)). 


Proor or (2.7): 


In uS* = a, in ( 


Gn(i . gt | 


4 
= (an + o(n')) In (1 + a 
pil — p) 


+ o(n)) —am/p(l — p). 





434 WALTER VOGEL 


Formula (2.7) will be useful in Section 5. 

Let the r.v.’s K°” and Uj” be defined correspondingly to K and U,. We 
want to evaluate the limit (for n > ~) of n'E(K“”). In the following argu- 
ment we do not give a proof but proceed heuristically. “Let the reader who 
has never used this sort of reasoning exhibit the first counter example’’ (see 
[4] p. 395). 

The point (k, n*U,”) describes a random walk in the plane. Absorbing 
barriers are n US” = +a+ 0(1) and k = n/2=4y. K™ is the “time” 
taken to reach the boundaries. We will now construct a suitable Wiener-process 
and compute E(7) where T is the time taken to reach the boundaries in this 
Wiener-Process. Then we conclude heuristically that 


(2.8) n'E(K””’) > E(T). 


We have E(n“*U{”) = n°*kE(X$” — Y§") = k-m-n™ + o(n™), and Var 
(n*U{”) = nk Var (X$” — Y{") = 2kp(1 — p)n™. Now let n — « and 
k— © so that k/n = t is fixed. Then n“*U{” is in the limit normally distrib- 
uted with mean tm and variance 2tp(1 — p). We will therefore approximate 
n*U{” by a Wiener-process V, with absorbing barriers given by V; = +a 
and by t = 4 and with mean and variance as stated above. The formulas for 
this simple kind of process are well known (see [1] p. 47). Let f(z, t) = 


—- sian 9 2 
(4rp(1 - p)t) ’ exp 2mz m't > (— )* exp _ A a8) ‘ 


4p(1 — ip(1 — p) - 
then f(z, t) satisfies the diffusion equation 


af af 
7 = vf ; 
3t + mo pil — p) a3 


p(l — p)t 


and the boundary-conditions f(-ta, t) = 0. For the time T taken to reach the 
boundaries x = +a the probability density is 


+a 


a 
g(t) = ay is f(a, t) dx 


(see [1] p. 48) so that E(T) = fi tg(t) dt + ft 4f(z, 4) dz. The first term is 
related to absorption on the boundaries z = +a, and the second term takes 
into account absorption on the boundary ¢ = 4. Integration by parts gives 


(2.9) E(T) = ) [42,0 dae 


(2.9) gives an expression for the limit in (2.8). 


By the same heuristic argument we can get an expression for the limit of the 
probability of the event 


(2.10) B” = {-a, < Us” < +an,k S n/2}. 
Let a, = an’ + o(n'), pa — Gn = mn + o(n), a>0,m> 0, Pn, Gn > DP 





SEQUENTIAL DESIGN FOR THE TWO ARMED BANDIT 


and 0 < p < 1. It then follows that 
+a 
(2.11) P(B™) + | f(z, 4) dz. 


3. The loss-function. We come now back to the problem stated in Section 1. 
Let P(X = 1) = 1 — P(X = 0) = pand P(Y = 1) = p=1-— P(Y =0) 
and let n = 2y be an even integer. X and Y are the two r.v.’s between which 
the experimenter has to choose at each step and n is the total number of steps, 
fixed in advance. The strategy runs as follows: Begin sequentially with pairs of 
Ex I and Ex II (i.e. of X and Y) until a decision is reached to end this part of 
the strategy and to continue with Ex I or Ex II alone. Thus, observe X,, Y; 
in the first two steps, and then decide either to observe another pair or to con- 
tinue entirely with Ex I, or to continue entirely with Ex II. While pairs are stil] 
being observed we may describe the first 2k steps by X,, Yi: ; X2, Yo; --+ Xe, 
Y, (all assumed to be independent). The decision at this point is based upon 
U, = >-'., (X; — Y,) and an integer a > 0. If —a < U;, < +a, another 
pair is observed; if U, 2 +a(S —a) we stop observing pairs and use only 
Ex I (Ex II) for the rest of the n steps. Let K be the random number of ob- 
served pairs (0 < K S n/2). a, w (nm = 2y) and the r.v.’s U, and K form a 
lay-out as described in Section 2 and we may use formulas (2.4)—(2.6). 

The expected sum for all n steps, say W, is 


w= e(S(%+¥0) +(x) +a-ve( Yu). 
i=l 1—2K+1 i—2K+1 
Here ¥ is defined as in Lemma 1, Section 2. The first term is related to the part 
where pairwise observations are made, the second and third terms stem from 
the possibilities to continue with Ex I alone or with Ex II alone. 

Using Theorem 1 and Lemma | we get 

' r° , 

Lia W = (p+ q)E(K) + [aae (n — 2E(K)) 


1 7 


where r = p(l — q) and s = q(l — p). 

The best possible expected outcome of the whole sum is n max (p, q). We 
define the loss-function L, = L(a, p, q) a8 L, = n max (p, q) — W. Leto = 
max (p,q) and r = min (p, q), then 


_ n(o — 7) eo 
L, = ut + 1 + (e (eat 


with u = «(1 — r)/r(1 — o). We remark that the loss-function is symmetric 
in p and gq, i.e. that L(a, ¢, r) = Lia, 7, ¢). A strategy simply consists in the 
choice of an integer a(0 < a S n/2). 


) BK), 





436 WALTER VOGEL 


Suppose, now, that a fixed n has been given and that we intend to use a strat- 
egy, i.e. we want to choose an a. Our intention is to minimize L, . The question 
of which a to choose cannot be answered unambiguously. When some knowledge 
about p and q in the form of an a priori distribution is given, we can try to 
compute the expectation of L, with respect to this a priori distribution. This 
will be a function of a only. As a is an integer between 0 and n/2 there exists 
at least one a, which gives a minimum. The actual computation will be difficult, 
because no exact formula for Z(K) is available. We therefore use an approxima- 
tion. Let 


: n(o — 7) u“—1\ 
(3.2) M, = M(a,¢,7r) = —aeT +a ( = ) ; 
From (2.6) follows M, 2 L, and forn — © (but a fixed) we have L, — M, — 
0. As long as a is small compared to n, we will use M, as an approximation of L,, . 
We illustrate the use of M, by an example. Let (p, q) be either (0.6, 0.4) or 
(0.4, 0.6) and n = 100. Then o — r = 0.2 and » = 2.25. An easy computa- 
tion shows, that a = 3 is the only integer that makes Mio a minimum. Since 
a is small compared to n, the approximation is justified. 


4. A minimax theorem for the approximate loss-function. In this section we 
are concerned only with the approximate loss-function M,, and we regard a 
not as an integer but as a continuous variable (0 < a S n/2). 

THEOREM 2: For n = 4 we have 


min max M(a,¢,r) = max min M(a,<¢,r) 


a oT @,T a 


We will prove the theorem by showing 
(4.1) M(a,,¢,7) S M(an,on, tr) S M(a, on, Tr). 
TuHEoreM 3: The asymptotic behavior of a,, ¢, and r, is given by a, = an' 


+ o(n') with a = 0.292 --- ,on — t, = mn + o(n™) withm = 1.89 ---, 
On, Tn — ¥ and un" — 9.06 ---. 

We prove Theorems 2 and 3 together and proceed in several steps. 

(i) M, is monotone increasing in (¢ — 7) and we compute max(¢ — r) 
under the condition u = constant. We find max (¢ — r) = (u' — 1)/(d + 1) 
and ¢ = 1—7 = u'/(u' + 1). Then (4.1) is equivalent to 


4 a 2 
n u—l u—l 
Mam) "SST (553) "7 (oS) , 
(ii) The saddle-point of M(a, u) can be obtained by setting 


aM _ aM _ 
dai kt” 


M(a,,u) S M(an, Un) S&S M(a, Un), 


where 


0. 





SEQUENTIAL DESIGN FOR THE TWO ARMED BANDIT 437 


We will show that these equations have a common solution, a, , u, , that (under 
(iii) and (iv)) a, gives a minimum for M(a, u,), and that u, gives a maximum 
for M(a, , u). 


Setting ‘x = 0 gives (after division by (u* + 1)~u* In u) 
a 


(u* — 1)? u—1 - 1 
(42) whe *s21* Meee 


Setting = = 0 gives (after division by au*'(u* + 1)~*) 
i a j a 
u(u" + 1) u—i1 e-l 
re tif “Sti “esi 
From (4.2) and (4.3) it follows that 
n((ub — 1)/(u' + 1)) = (a(u — 1)(u* — 1)*)/((u* + 1) Inu). 
This, introduced in (4.2), gives 


(4.3) 0. 


u-—-1 «+1 4 
(4.4) winu utinue’ w—1l- 


We now rewrite (4.4) and (4.2), but set u* = z, thus 


(45) u-1l 241 4 


winu «lar 2«—1’ 


: u’—1 (x — 1)° z-1 
(4.6) n (>t) inv = ER +4751 ** 

A simultaneous solution of (4.5) and (4.6) would yield a simultaneous solu- 
tion of (4.2) and (4.3), which is desired. Now, the left side of (4.5) is a mono- 
tone increasing function of u. As u ranges from 1 to © the left side ranges from 
1 to © also. The right side of (4.5) is a monotone decreasing function of z. 
Let c be the unique solution of 


(4.7) 1 = ((2 + 1)/(zlnz)) + (4/(z — 1)), c = 9.06 --- 


so that (4.5) defines a function, z = A(u) say, which is monotone decreasing 
from c to 1. 

The same kind of argument shows that (4.6) defines a function, z = B(u) 
say, which is monotone increasing from 1 to ~. So the two functions have 
exactly one point in common. Its abscissa is u, and its ordinate, z, = u%", 
gives a, . This shows that there is exactly one pair, u,, a,, which satisfies 
aM _ aM 
dass us 

(iii) We show that M(a, u,) has a minimum for a = a,. The left side of 
(4.2), times a positive factor, is just 2M/da. This left side is monotone increas- 


= 0. It is easily seen that 0 <u, < © and that 0 < a, < n/2. 





438 WALTER VOGEL 


ing, therefore it is negative for a < a, and positive for a > a, . Then this is 


true for = also and M has the desired minimum. 


(iv) To show that M(a, , u) has a maximum for u = u, is a bit more diffi- 
cult, and, before doing so, we investigate the behavior of a, as a function of n. 
The function z = B(u) depends on n and we will write it as B,(u). As is seen 
from (4.6), the following holds for all u > 1: If m > m, then B,,(u) > B,,(u) 
and if n— « then B,(u) — o. It is therefore clear that, if n — ~, then u, 
is monotone decreasing towards 1 and A(u,) is monotone increasing towards c. 
Now A(u,) = 2, = un” and 


(4.8) a, = In z,/ln u, 


so that a, is monotone increasing in n. 
For n— « and u — 1 the left side of (4.6) may be approximated by 


(n In? u)/4 


and the right side (as z — c) by a constant. From this we see that In u, ~ bn 
and from (4.8) that a, ~ an’, where b and a are constants. A more careful 
examination gives In u, = bn + o(n), on — tm = mn + o(n") and 
a, = an' + o(n'). 

A numerical computation shows that a, > 4. As n 2 4 was assumed, we 
may use a, > } in the following proof that M(a, , u) has a maximum for u = 
%. 

aM 


= is, aside from the positive factor au*'(u* + 1) (ui — 1)(' + 1)", 


given by 


(u* + 1)n* NSH) 
” u*(u — 1) wh ie u*+1/\ui —1/° 


The first term is monotone decreasing in u, the second term is constant, and, 


‘ . F 0 
for a > 4, the third term is monotone decreasing also. But then = goes from 


positive to negative values at u = u,, which is what we wished to show. 

(v) The next step is the determination of the constant in na, — a. From 
(4.7) we find c = te = 9.06 --- . Then (4.6) gives, for n — ~, } n In’u, & 
(c — 1)*e* + 4(e — 1)(e + 1) In c = 14.21 --- or In uy 7.54 n° and 
by (4.8) we have 


aan? ~~ (In c)/7.54 = 0.292 --- =a. 


In a similar way it can be shown that (¢, — t.)n' > m = 1.89---. Fur- 
thermore, we have z, = us" > c = 9.06, andfrom co, = 1 — 7, ando, — 7, — 
0 it follows that oc, , r, — 4. Thus Theorems 2 and 3 are proved. 





SEQUENTIAL DESIGN FOR THE TWO ARMED BANDIT 439 


In order to compute a, = a(n), we first get z = A(u) from (4.5). In for- 


mula (4.6), or 
_ (z— 1)’ (25) fect 
a ee oad Inz wait” 


we put z = A(u) and get n = n(u). Formula (4.8), or a, = (In A(u))/In u, 
gives a = a(u). So we have gotten a representation of a(n) in terms of the 
parameter u. This allows us to compute a(n)n™. We find n%a(n) = 0.292 
for n 2 100, and for n = 70 the third decimal is influenced for the first time by 
two units. The whole computation was made with a slide rule. na, changes 
only slowly with n. We therefore propose to compute the sequential plan (for 
n 2 100) by the simple formula 


a, = [0.292 n'). 


M, is a good approximation only as long as P(B‘"’) is small (For the defi- 
nition of B‘” see (2.10) ). The asymptotic behavior of a, , o, and r, is such that 
we can apply (2.11), which shows that P(B'”) does not vanish in the limit. 
Therefore M, should not be used asymptotically if both the experimenter and 
nature use the strategies derived in this section. 


5. The loss-function as n — ~. In the previous section we used the approxi- 
mation M because no exact formula for the truncated sequential procedure is 
available. If the number of steps tends towards infinity the random walk will 
become a Wiener-process and we can use the results from the end of Section 
2. In this way, we will get another approximation for L, , valid when n is large. 
Let 


a, = an! aa o(n’), a> 0, 


on — Ta = mn + O(n’), Cay Ta — PD, O<p<l. 


(a, , ¢, and 7, have now different meanings than those in Section 4). 
THeoremM 4: lim,.. n Laan , on, Tr) = Le(a, m, p), where L, is given by 
Formula (5.1) below. 
PRooF: 


j a (on vas ra)n? iy (= — 1) 
n Le —_ use 410 a (on Tr)n uss + 1 eae . 
We use (2.7)-(2.9) and get 
(5.1) Le. = m(v" + 1)* + m(o* — 1)(0* + 1) "E(7), 


where v = exp (m/p(l — p)) and E(T) is given by formula (2.9). 
L.. has a cumbersome formula because E(7') has one, and it would be worth- 
while to try to find a value a which gives a saddle-point, i.e. for which 


Le(d,m,p) S Le(ao, mo, po) S Lela, mo, po). 





440 WALTER VOGEL 


Then a = ag’ could be used as an approximation to a minimax strategy. 
We can make only one step in this direction, namely we can prove that L,, is 
monotone increasing in m for fixed v and a. 

The first term of L,,, m(v" + 1)~', surely is monotone increasing and we will 
show that mE(T) is also. Putting 2mt = \, we get 


mE(T) =nf- [s. Datdr = [ [ (= pl pt — 9D)" exp 


4r-ra  & . (x — 2as)’ 
-— — —1)’ exp — ———___—_ 
8p(1 — p)/m X wire be 2rAp(1 — p)/m 
This shows that, if v = exp (m/p(1 — p)) is fixed, then mE(T) is monotone 
increasing in m (the integrand is positive and depends only on pv). 
The maximum of m, if v is fixed, is mux = (In v)/4. Putting m = minx 
(and consequently p(l — p) = }) we get 


Le = (Inv/(4(v* + 1)) + 4 Im o((v* — 1)/(0* + 1)) ECT), 


dy dz. 


with 


: Le ae ae 4 82 In v — In’ vt 6 
E(T) = : I (xt) exp ee ae eT (—1)* exp 
_ (z aa 2as)* 


; dt dz. 


Here L., is a function of v and a only. We strongly suspect, that this function 
has a saddle-point, but we did not succeed in proving it. 


6. Generalization to other random variables. In order to generalize the 
procedure to other than binomial r.v.’s we have to make strong assumptions 
about the mathematical form of their distribution functions. Let 


P(V(t) < v) = [_ 42,1) du(2), E(V(t)) = p(t) 


f(z, 4 f(y, te) ain ‘ 
er f(z , te)fly, h) 2(2, Y; t, ’ ty). 


We make the following two assumptions: 

(6.1) z(t, y;h, bt) = g(t, &)A(z, y), 

where g does not depend on z and y and h does not depend on ¢, and ¢, ; 
(6.2) g(,,%) > 0 whenever p(t) > p(h). 


Now let the outcomes of Ex I and Ex II be described by the r.v.’s V(t,) and 
V(t.) so that X = V(t,) and Y = V(t), and that the density for the pair Ex I, 
Ex II is f(z, t)f(y, tb). Besides this hypothesis H, we consider the hypothesis 





SEQUENTIAL DESIGN FOR THE TWO ARMED BANDIT 441 


H;, that the pair Ex I, Ex II has the density f(z, &)f(y, t), ie. that X = V(&) 
and Y = V(t,). All computations will be done under H, . 

Let Z = g(t, te)h(X, Y) and U, = >04.,Z;. The Z; are independent 
realizations of Z. We consider the following random walk in the (k, u) plane. 
The walk starts at (0, 0). Absorbing barriers are (i) u = a and (ii) u = —a. 
There should be a third absorbing barrier at k = n/2 (n is assumed to be even), 
but we do the following computations without regard to it, and therefore get 
only approximations. As long as the walking point is not yet absorbed, its 
general position is (k, U,). The usual approximation for the probability of being 
absorbed at (i) isy = e’/(e* + 1) (see [1], p. 95). The conditional probability, 
yx say, for absorption at (i), provided the walk ends after k steps, is approxi- 
mately 7, = y. As this may not be well known, we prove it. 

The hypotheses H, and H, are such that y(H;) = 1 — y»:(H2) and P(K = 
k|H,) = P(K = k| H:). Here K is the random number of the step at which 
the point is absorbed. We define the event @ as 


( 


k 
@={U,2a;U;<a for i<k} ={ I] #(X;, @)f(%, &) 


t-I 
\ 


k | 
> [ISX )f(% 4); Ui < aforick). 


rl J 


Then, for all k with P(K = k) # 0, 


k k 
y. P(K =k) = P(@| M,) = [ I] f(a. f(y: te) TL delay) duly) 
tl 


@ i=l 


k k 
2. é II] f(a: f(x, t) I] du(z,) duly) _ e’P(@ | Hy) 
i=l 


@ i=l 


— e(1 —_ 1) PUK = k). 
It follows that y 2 e’(1 — ye) or ye 2 e’/(e’ + 1), and therefore 


“/(e+1)=7= > wP(K = k) = e’/(e" + 1). 
m1 


But then y = e’/(e° + 1) = ¥. 

A strategy is as follows: we first take pairs X, Y of observations, say k times, 
and the rest of the n — 2k observations are made either with X alone or with 
Y alone. The number k and whether to continue with X or with Y is given by 
the sequential plan. (We choose X when the random walk stops at (i)). 

The expected outcome W of all n experiments is approximately 


(6.3) W = (p(t) + p(t))E(K) + (n — 2E(K))(yp(th) + (1 — y)p(b)). 


This follows in the same way as equation (3.1). 
Our strategy is completely symmetric in the treatments of Ex I and Ex II. 
The loss L = n max(p(t,), p(t.)) — W does not change when the names of 





442 WALTER VOGEL 


the experiments are changed. We therefore may assume that p(t,) 2 p(t) 
and we compute the loss under this assumption. 


p = Molt) = pls) 4 ane) — (4) (= 1 

. e+ E(Z | H;) e+)’ 
where the relations E(K)-E(Z) = E( Ux) = a(e’ — 1)/(e*° + 1) and y = 
e’/(e* + 1) have been used. 

To use the strategy practically we must be able to compute Z from X and Y; 
i.e. g(t, , 2) must be known. As it is essential that no complete information about 
t; and t& is available, we proceed as follows: 

Let a = ag(t,, t), draw a plan with absorbing barriers at u = + a and use 
U, = >-'..2Z,, (with Z = h(X, Y)) in this plan. This means only a change of 
scale (for g > 0). Then, with u = exp g(t, t), we have 


1 = Mwlt) = plu) 4, awa) — wad) (a2 = 1 


+1 u=+1 


ue + 1 E(R(X, Y) | A) 


Now our strategy is feasible as soon as we have decided which a to use, for 
the functional form of h is known. The loss is a function of our strategy a and 
of the pair 4, , & (but not of the ordered pair). 

To give an example, let f(z, t) = (2x)? exp — (2 — t)’/2. Then z = 
(t; — t)(x — y). We choose g = t; — h and h = x — y for then g > 0 when- 
ever p(t;) = t is greater than & = p(t). Further 


L = n(t — t)/(u* + 1) + a(u*® — 1)*(u* + 1)? 


with u = exp(t, — &) is the loss computed under the assumption 4, 2 t. 
Without assumptions about 4, and t the loss will be 


L = nn/(u* + 1) + a(u* — 1)*(u* + 1)” 
with 7» = |4 — &| and u = exp 7. 
We can easily find a minimax solution. The maximum of 7 if u is constant, 
iS max = In u. Inserting this in L we have 
(6.4) L =ninu/(u* + 1) + a(u® — 1)*"(u* +1). 
Now the formula 


ub — 1 


Menai 


J wt +1) + a(u* — 1)*(u* +1)” 


from Section 4 (at the end of (i)) gives, asn— « and u— 1, 
M = nin u/4(u* + 1) + a(u® — 1)*(u* + 1)”; 


and it was shown that there is a minimax solution a,n~* — 0.292. A comparison 
with (6.4) gives the following minimax solution for the above example: 


a = 0.292 (4n)' = 0.584 ni’. 





SEQUENTIAL DESIGN FOR THE TWO ARMED BANDIT 443 


For another example, let P(X = 1) = p = 1 — P(X =0) and P(Y = 1) = 
q = 1 — P(Y = 0). Let the likelihood-quotient be Q, then InQ = (2 — y) In 
(p(1 — g)/q(1 — p)). With u = exp g(p, q) = p(l — q)/q(l — p) for p > 
q and u = g(1 — p)/p(1 — q) for p < q and with o — r = | p — q| we get 
(3.2) as loss-function. 


REFERENCES 


{1} M. S. Barriert, “An Introduction to Stochastic Processes,’ Cambridge University 
Press, 1955. 

[2] R. Batiman, “‘A problem in the sequential design of experiments,’’ Sankhyd, Vol. 16 
(1956) pp. 221-229. 

[3] R. N. Brant, 8. M. Jonnson, anv 8. Kanruin, “On sequential designs for maximising 
the sum of n observations,’”’ Ann. Math. Stat., Vol. 27 (1956) pp. 1060-1074. 

[4] J. L. Doos, “‘Heuristic appreach to the Kolmogorov-Smirnov theorems,’’ Ann. Math. 
Stat., Vol. 20 (1949) pp. 393-403. 

(5]"J. R. Ispe.t, “On a problem of Robbins,’’ Ann. Math. Stat., Vol. 30 (1959) pp. 606-610. 

(6) H. E. Ropstns, “Some aspects of the sequential design of experiments,’’ Bull. Amer. 
Math. Soc., Vol. 55 (1952), pp. 527-535. 

[7] H. Ropsins, ‘‘A sequential decision problem with a finite memory,’’ Proc. Nat. Acad. 
Sci., Vol. 42 (1956), pp. 920-923. 

(8] Wavrer Voaget, “An asymptotic minimax theorem for the two armed bandit problem,”’ 
Ann. Math. Stat., Vol. 31 (1960), pp. 444-451. 





AN ASYMPTOTIC MINIMAX THEOREM FOR THE TWO 
ARMED BANDIT PROBLEM 


By Water Vocet! 
University of Chicago and Universitat Tiibingen?® 


1. Introduction. Let Ex I and Ex IT be two experiments, the outcomes 
which are described by the two random variables X and Y. Let P(X = 1) = 
p=1— P(X =0), P(Y = 1) =q=1-—P(Y =0) and 0<p, q <1. 
An experimenter has to do n experiments, one after another, and at every step 
he may choose between Ex I or Ex II. He does not know the values of p and q 
and he wants to maximize the sum of all outcomes. Therefore he will choose a 
strategy, i.e. a procedure which tells him which experiment to use at the kth 
step as a function of his previous choices and the previous outcomes of the ex- 
periments. The question how to find a suitable strategy is known as the problem 
of the two armed bandit. For approaches other than the one used in this paper 
see [1], [2], [4], [5] and [6). 

We will measure the value of a strategy by a loss function. Let Il, be the un- 
conditional probability of choosing Ex I at the kth step. The expected value of 
the performed experiment at the kth step will be 


Ihp+1-thjq=p-(p-—ql-tk) =q- (q-— py. 


We define as loss at the kth step: 


max (p,q) — (Ikp + (1 — Ik)q) = (p — q)(1 — Ik) 


if p = gor (q — p)I if p S gq. The loss L(p, q) for the whole game is then 
(p—q) > (1—M™) or (q—p) >m™. 
k=l k=l 
In L(p, q) and II, (p, q) the first argument is always related to Ex I. Let ¢ = 
max (p, q) and + = min (p, q); then 


L(a, 7) (o — r) > (1 — Me, r)) 
k=1 
and 


L(r, c) (¢ — r) > Mr, @). 
k=l 


As we do not suppose any previous knowledge about p and gq, it seems natural 


Received April 2, 1959; revised January 18, 1960. 

! Research carried out in part at the Statistical Research Center, University of Chicago, 
under the sponsorship of the Statistics Branch, Office of Naval Research. Reproduction in 
whole or in part is permitted for any purpose of the United States Government. 

? Present address: Mathematisches Institut der Universitit Tiibingen, Germany. 


444 





ASYMPTOTIC MINIMAX THEOREM 445 


to use only strategies which are symmetric in Ex I and Ex II, i.e. for which 
Lio, r) = L(+, oc). Every strategy s can be made symmetric. Define s’ just as 
s, but with Ex I and Ex II interchanged, and then choose s or s’ with prob- 
abilities 4, 4. 

We give an example: Let s be: use Ex I all the time. The loss is L(p, g) = 0 
for p 2 q and L(p, gq) = n(q — p) for p S q. The corresponding symmetric 
strategy is: choose Ex I all the time or Ex II all the time with probabilities 4, 
4; the loss is then L(p, q) = n| p ~— q//2. 

For a symmetric strategy L(o, r) = L(r, ¢), and we can therefore write 


(1.1) L(p, q) = —s— be (1 — Mo, r) + U(r, ¢)). 
This is also the loss for an arbitrary strategy if the possibilities p = o, g = + 
and p = rt, gq = o have a priori probabilities 4, 4. We shall always use (1.1) 
as the loss-function. 

Besides the strategy of the experimenter there is a strategy of “‘nature”’ 
which consists in choosing a pair p, g. The use of (1.1) as loss function may be 
interpreted in two ways. 

First interpretation: Nature’s strategy is to choose a pair ¢, 7 and then to 
play either Ex I with o and Ex II with + or vice versa. The experimenter is 
free to use any strategy. 

Second interpretation: Nature is free to use any strategy but the experimenter 
is restricted to symmetric strategies. 

Let s be a strategy of the experimenter and let ¢ be a strategy of nature. We 
will write L(s, t) in place of L(p, q) in order to exhibit the dependence of L 
on both strategies. 


2. Statement of the theorems. In this paper we are interested in sequences of 
strategies. Both, the experimenter and nature have to choose strategies for 
every n. Let S = {s,} be a sequence of strategies of the experimenter and let 
T = {t,} be a sequence of strategies of nature. This defines a sequence 


{L, = L(8, ’ t,)} 


of loss functions. We use the “order of infinity” of this sequence to construct 
a “weak” loss-function 1(.S, 7). For \ large enough we have L, = o(n”). Now 
let 


(2.1) US, T) = inf {rx} L, = o(n’)}. 


The following theorems will be proved. 
THEOREM 1. minsmax7l(S, T) = maxrming(S, T) = }, ie. there are se- 
quences So and Ty such that for every sequence S and for every sequence T 


US, T) S$ US, T) = } S US, To). 





446 WALTER VOGEL 


An example of a sequence 7» is given by 
(2.2) | Pn — Qn) = mn'*+o(n"); Dr, Gap 


where m > O and 0 < p < 1. 

An example of a sequence Sp, is given by the strategies defined in [7], These 
strategies at given by numbers a (a serves to construct a sequential plan), 
and in order to get a sequence Sy we must choose 


(2.3) a, = a-n' + o(n'), a> 0. 


We will reserve the letters So and T> for sequences of strategies for which 
(2.3) and (2.2) hold. 

Of course the strategies So will not be the only minimax strategies for the 
weak loss-function 1(S, 7). There may be a large class of such strategies. To 
pick an especially good one out of this class one has to use a stronger criterion 
than 1. 

Treorem 2. Let Ty = {t<”} be a sequence as defined by (2.2). Then we have 
for every sequence S 
(2.4) lim inf (L(s, , t&”)/n*) 2 C, > 0. 

C, depends on the values of m and p in (2.2) 

Corotiary To THEoreM 2. Let {t<°} be defined by | pa — gn | = 0.849 n™, 
Pn, Gn — 4. Then we have for every S 
(2.5) lim inf (L(s, , t)/n') 2 0.1876. 

TueoreM 3. Let {s\"} be the sequence of minimax strategies as defined in (7) 
(see also (4.2) in this paper). For these strategies, (2.3) is valid with a = 0.2% 
and we have for every sequence T 


(2.6) lim sup (L(s%”, t,)/n') < 0.376. 


Turorem 4. Let Sy = {8<"} be a sequence of strategies as defined by (2.3). Then 
we have for every sequence T 
(2.7) lim sup (L(s%”, t,)/n*) S Ci < @. 

C, depends on the value of a in (2.3). 

Remark: If the random variables X and Y of Ex I and Ex II are normally 
distributed with common known variance, we can prove virtually the same 
theorems. 

Theorem 1 follows immediately from Theorems 2 and 3 or 4. The proofs of 
Theorems 3 and 4 depend on results of [7]. The minimum over all possible C, 
of theorem 4 is 0.96; therefore Theorem 3 is not merely a corollary to Theorem 4. 
The proof of Theorem 2 is independent of [7]. The main idea of the proof of 
Theorem 2 is as follows: Construct a new game which obviously gives a smaller 





ASYMPTOTIC MINIMAX THEOREM 447 


loss than the old one and which is simple enough to allow the computation of 
a best strategy. Then the new game can be used to obtain a lower bound for 
the loss of the old game. 


The author is very grateful to the referee for his valuable suggestions. 
3. Proor or THeorem 2. We need a lemma. 


Lemma. Let Z;,,(i = 1, 2, +--+) be n identically distributed random variables 
with 


E( Zin) = = mn* + o(n), Var (Zin) ~ on — o, 
o>0, E(\Zin — E(Zin)|*) = bn > bz 
Let further Ux” = >°'1Zin. Then (forn — ©) we have 


LS Pus” <0)—- [o(-2 (2)') ax 


where © denotes the standardised normal cumulative distribution function. 
Proor: Let 


il . ” ’ m i } “ 
A= oy P(UL” <0) - [o(-" (2)") de| = E Asn 
where 


in 
n-Ay,, = P(US’ <0)—n [ o(-” (2)') az| s 2. 
(k—1)/n go 


As a first step we prove that n-A,., — 0 uniformly for all k 2 n’. By a theorem 
due to Berry and Esseen ([3], p. 201) we have 


b, b. 
(3.1) P(US <0) —@(—™ (k)') se4#k?’ seGntso. 
Cn | on on 


By the mean value theorem there is a &,, with (k — 1)/n S &. & k/n 80 
that nfti®1/. ®(— (m/c)(z)*)dz = © (— (m/c)(ts,.)*). Therefore 


Mn 14 - (-" ‘) 4| f° -en 


Here g = —(m/c)(k/n)' + o(n™) anda = —(m/o)ts.»- Now |4—-¢|— 0 
uniformly in k and therefore B — 0 uniformly in k. This together with (3.1) 
shows that n-A,,, —> 0 uniformly in k (for k > n'). It follows that 


Dom tcn 4 Ain 790 


and we have 
{(n)4} n 


OSAS 2D Avot DL Ano SM + DF Ane 0. 
k=l 


kam ((m)9)41 km ({(m)t)41 


This proves the lemma. 





448 WALTER VOGEL 


We define a new game with a loss-function A so that 


(3.2) inf L(s, t) = inf A(s, t) 

a o 
for every fixed n and t. The new game will give the experimenter greater stra- 
tegic possibilities. 

We can play the old game in the following manner: At every step an umpire 
performs Ex I and Ex II. The experimenter simply states whether he wants 
Ex I or Ex II this time and then the umpire tells him the outcome of that ex- 
periment which he had chosen and this outcome counts for the loss-function. 

The new game will be played as follows: The umpire performs the two ex- 
periments. The experimenter states which experiment he wants. Then the 
umpire tells the experimenter the outcomes of Ex I and of Ex II. For the loss- 
function, only the outcome of the chosen experiment counts, but of course this 
time the information which the experimenter gets at every step is greater. As 
the experimenter is free to use this additional information or not he has all the 
strategic possibilities he had before as well as some new ones. The set of values 
A on the right side of inequality (3.2) contains the set of values of L on the 
left side. Therefore the inequality is justified. 

We will now show, that there is a uniformly best strategy in the new game, 
namely to play whichever experiment is ahead so far. 

In the old game the choice at a certain step will not only influence the gain 
at this step but also the information available at the following steps. In the new 
game the information can not be influenced by a strategy, so that the best thing 
to do is to minimize the loss step by step. All we have to do is to get a strategy 
which (at the k + 1th step) minimizes 1 — IIk4;(¢, 7) + Teai(t, o) or maxi- 
mizes IIui(o, 7) — Theai(rt, o) where o 2 7 (see (1.1)). 

The k + Ith step of a strategy is given by a function 


f = f(X,°-°° Y.;¥Y1,°°° Y;) 


with 0 Ss f S 1. f gives the probability of choosing Ex I as a function of the 
previous history and we have II,,, = E(f). In the new game X, --- X; and 
Y, --- Y, are independent binomial random variables with parameters p and q. 
Since (>°i., X; = wu, Dia: Y; = v) is sufficient for (p, q) we may restrict our- 
selves to functions of » and »v alone. Therefore 


k k k k 
Inys(o,7) = E(fle,r) = > ¥ s(u,»)P ( x, = ple) P(X Y; = r\r) 
i=l 


pl vl t=1 


with P(>-t., X; = plo) = (‘) o“(1 — o)*™ and similarly for y, v, r. It fol- 


lows that 


im t 2 
Tngilo, 7) — Meas(r, ¢) = > > flu, v) (‘) (‘) (1 — «)” *e"(1 — r)””*D,, 
wu 


pel vl j 





ASYMPTOTIC MINIMAX THEOREM 449 


where D,, = 1 — (r(1 — o)/o(l — r))”” 2 O if p> v and D, s 
Oif uw < ». 

This shows that we maximize I],4;(¢, r) — Ik4i(7, ¢) by choosing f(y, v) = 1 
for» > v and f(y, v) = Ofor uw < v. For uw = v we choose f(y, v) = 4 in order 
to have a symmetric strategy. It follows that the best symmetric strategy 
(regardless of the strategy of nature) is to play Ex I if w > v and to play Ex I 
with probability 4 when yp = ». 

Let X, — Y, = Z; and >°\_, Z; = Uy. If we use the best symmetric strategy 
we have 

Tig = P(U, > 0) + 4P( Uz = 0) 


The formula is correct even for k = 0 if we put Uy = 0. The loss-function A 
is then (see (1.1)) 


n—l 
A= —— dy ( — P(U, > 0\¢,7) — 3P(U, = O}e,7) 


+ P(U, > 0|7,0) + 4P(U, = 0|1,¢)) 


Using P(U, = Ole, r) = P(U, = O|7, co) and P(U, > O|7, ¢) = 
P(U, < 0\ ¢, r) we get 


n—l 
(3.3) A = (ao — 1) > (P(U, < O0\e,7) + 4P(U, = 0)6,7)). 
k=O 
Let nature use 7, i.e. let o, — tr, = mn? + o(n™) and onit, > p. U, will 
depend now on n and we write Uj”. In (3.3) we drop the terms P(U, = 0| ¢, r) 
and we write 2 in place of =. It then follows, that, 


nA, =m . a. P(U{” < 0) ¢,7) + o(1). 


T kmd 


For Ui” the assumptions of the lemma hold and therefore 


1 
lim inf nAn =m [ } (-" (2)") dx, 
non 0 


where & = 2p(1 — p). As we have used the best possible strategy (say #) we 
conclude that 


n*L(s, ‘ )2 inf n*A(s, t”) =n 4g, th ). 


But then we have 
lim inf (L(s, , tS)/n') = C, > 0 
n-x 

° ’ 1 4 
with C; = mfo%(— mz’/6) dz. 

Proor oF THE CoROLLARY TO THEOREM 2: C; attains its maximum for & = 4 
(i.e. for p = 4) and m = 0.849. The value of the maximum is 0.187. This follows 
by numerical computations. 





450 WALTER VOGEL 


4. Proof of Theorem 3. We give some results from [7] which will be needed 
in the sequel. The strategy treated in [7] runs as follows: At the first 2K steps 
(K is a random variable) Ex I and Ex II will be used alternately so that we 
have a sequence of random variables X,, Y;, X2, Y2,--: Xe, Ye. Let Oy = 
>t, (X; — Y,). As long as —a < Uy < +a we make another pair of ex- 
periments and so get Xe4:, Yeu. When U, 2 +a we continue for the remain- 
ing n — 2k steps with Ex I and when U, S —a we do so with Ex II. In every 
case, we stop after n experiments. A sequence of strategies for the experimenter 
consists of choosing a number a for each n, and from now on s, always means 
this kind of strategy. 

It was shown in [7] that 


(4.1) L, & n(o — 1r)/(u* + 1) + a(u* — 1)*/(u* + 1)? = M 


where o = max (p, gq) and tr = min (p, gq) and u = (e(1 — 1r)/r(l — o)). 
There are strategies s<” and ¢<” such that, for all s, and ¢, , 


(4.2) M(s,’,t,) S M(sQ”,t”) S M(s,, t%), 


i.e. there are minimax strategies for the approximate loss-function M. For the 
strategies s\.” we have 


(4.3) an = 0.292 n' + o(n’), 
and for the strategies tS” we have 
(4.4) On — tm =189N + 0(n"); on; tr 4, 
and 
(4.5) un" — 9.06. 
It is now easy to prove Theorem 3. We use (4.1) and (4.2) and get 
nL (80, tn) Ss nM (8°, i) nM (8°, e). 


The last expression converges (using (4.3), (4.4) and (4.5)) to 0.367. Therefore 


sas te) & 9.367. 
a 


5. Proof of Theorem 4. Now let s<” be a strategy as defined in (2.3) ice. let 
a, = an’ + o(n') and a > 0. Then 


(5.1) n?L(s%, t) s nM(s®, t) 


= n'(o — r)/(u™ + 1) + (a + 0(1))(u™ — 1)?/(u™ + 1). 


The last term is smaller or equal to a + o(1). To get an upper bound for the 
term n'(o — r)/(u% + 1) we first show 


Inu = (o — r) In 5. 





ASYMPTOTIC MINIMAX THEOREM 451 


From ¢ = + follows u = 1. Therefore the inequality holds for ¢ = +r. Now let 
o > +r. Because r(1 — ¢) < } we have 


(o — r) Inu = (o — r) ‘In (1 + ((e — 7)/r(1 — @))) 
> (o — r) "In (1 + 4(¢ — r)) = mime In (1 + 42) = In. 


0<251 


Therefore 


in nt (e—1) (a+o(1)) 


nie — r)/(u"+1)s n'(¢ — 7)/5 


< max 2/5°°**” = (e-a-In5)~ + o(1) 


z20 
This proves that 
lim sup n'M(s°, ta) SC; 


n-o 
with C; = a + (aeln5)~‘. Taking account of (5.1) it also proves (2.7). 
REFERENCES 

{1} R. Beviman, ‘“‘A problem in the sequential design of experiments,’’ Sankhyd, Vol. 16 
(1956), pp. 221-229. 

[2} R. N. Brant, 8. M. Jounson, anv 8. Karun, “On sequential designs for maximizing 
the sum of n observations,’’ Ann. Math. Stat., Vol. 27 (1956), pp. 1060-1074. 

[3] B. V. Gnepenko anp A. N. Kotmocorov, ‘Limit distributions for sums of independent 
random variables,’’ translated by K. L. Chung, Cambridge, Addison-Wesley, 
1954. 

(4) J. R. Ispeut, “On a problem of Robbins,’’ Ann. Math. Stat., Vol. 30 (1959), pp. 606-610. 

[5] H. E. Rossins, ‘Some aspects of the sequential design of experiments,’’ Bull. Am. 
Math. Soc., Vol. 55 (1952), pp. 527-535. 

[6] H. Ropsins, ‘“‘A sequential decision problem with a finite memory,’’ Proc. Nat. Acad. 
Sci., Vol. 42 (1956), pp. 920-923. 

[7] Wa.ter VoceEL, ‘‘A sequential design for the two armed bandit,’’ Vol. 31 (1960), pp. 
430-443. 





AN ELEMENTARY PROOF OF THE AEP OF INFORMATION 
THEORY’ 


By A. J. THOMASIAN 
University of California, Berkeley 


1. Summary. Properties of the sequence of random variables —(1/n) log p 
are obtained for an arbitrary, not necessarily ergodic or stationary, information 
source. These permit an elementary combinatorial proof of the AEP (asymptotic 
equipartition property). 


2. Definitions and introduction. Let A be a set of r 2 2 symbols and let A‘” 
be the set of n-tuples from A. We call A an alphabet, and an element of A‘” a 
message of length n. Let A’ be the set of infinite sequences (y; , y2, ---) where 
each y, ¢ A, and let P be a probability distribution on the o-field of subsets of A’ 
determined by the cylinder sets. We call (A’, P) an information source and de- 
fine a sequence of nonnegative random variables X, by 


Xn(Yr, Yr, ee) 
= —n' log P[¥: = 1,°-::, Yn = yn) if PLY: er Yn) > O 
= 0 if P{Y; -o° Ya = yn] = 0 


where all our logarithms are to the base 2. In extending Shannon’s work [5], Mc- 


Millan [4] introduced the definition that a source has the AEP if X, converges 
in probability to a constant. For a stationary ergodic process, McMillan [4] 
proved that X, converged to the constant given in Section 4 in L’ mean and in 
probability; while Breiman [1] obtained convergence with probability one. Both 
proofs use an ergodic theorem and martingales. The proofs of Feinstein (2) and 


Khinchin [3] follow McMillan. 

For any integer n and any number 8, define D,(8) to be the largest probability 
of any subset of A‘” which has at most 2°" elements. In Section 3 we obtain rela- 
tions between P[X, s 8], D,(8), and EX, for an arbitrary source. In Section 4 
we restrict ourselves to stationary ergodic sources and use D,(8) and Theorem 3 
to prove the AEP. Except for two simple properties of entropy, Shannon [5] or 
Khinchin [3], p. 4 and p. 6, the paper is self-contained. 

Henceforth we will consider a fixed source (A’, P) and its associated r, X, , 
D,(8). 


3. Relations between P(X, < 8), D,(8), and EX,. 
Lema 1. For all « > 0, 8 

P(X, = 6) = D,(8) s PIX. S$ 8+¢44+2™. 
Received August 24, 1959, 


1 This research was supported by the Office of Naval Research under Contract Nonr- 
222 (53). 


$52 





AEP OF INFORMATION THEORY 453 


Proor. Let B be the subset of elements of A‘” which have positive probability 
and which belong to |X, < 8), and let M be the number of elements in B. Any 
point in B has probability = 2~*" so that 1 = P{X, < 6) = M2". Thus M s 2°” 
so that D,(8) 2 P(B) and the left-hand inequality is proved. 

Let F C A™ have at most 2°" elements and satisfy P(F) = D,(8). Then 


D,(8) = P(F) = P(FN[X, s 8+ d) + P(FN[X, > B+ 4d) 
S PIX. 8+ + 2°29", 

and the right-hand inequality is proved. 

Lemma 2. Let 8, be any sequence of numbers. Then 

(a) D,(8, + €) > 1 for all « > 0 if and only if P(X, S 8. + «) > 1 for all 
e> 0. 

(b) D,(B, — €) > 0 for all « > 0 if and only if P|X, S 8. — ¢€) > 0 for all 
e> 0. 

Proor. Immediate from Lemma 1. 

We pause for a moment to ask an incidental question. If there is a number 8 


such that X, & 8 (X, converges in probability to 8) and if for each n we select 
8, so that D,(8,) is approximately .8, then, must 8, — 8? The answer is yes by 
Theorem 1, which generalizes similar theorems proved by Shannon [5], Theorem 
4, and Khinchin [3], Theorem 3, p. 20. 

THeoreM 1. If a, 8, 8, are numbers such that X, 4 B,0 <a<l,anda< 
D,(B.) <1 — a@ for all n, then 8, => 8. 

Proor. If 8, does not converge to 8, then there is an « > 0 and a subsequence 
8, such that either 8,, = 8 + « for all n’, or 8, S B — e for all n’. In either 
case the assumption a < D,(8,) < 1 — a@ is contradicted by Lemma 2 with 
8, replaced by 8, so that Theorem 1 is proved. 

Theorem 2 and Lemma 4 show that to some extent the random variables X, 
enjoy some of the properties of a sequence of uniformly bounded random vari- 
ables. The proofs are based on 

Lemma 3. For any numbers « > 0,5 > 0,8 2 0 


(a) 6P(X, < 6 — 6) Ss (8 — EX,) + € 
+ (log r)P[X, > 8 + d — n'P[X, > B+ log PIX, > 8+ 4 
(b) PIX, > 8+ 6 s (EX, — 8) +6 + (8 — 5)P[X,< B — 5. 


Proor. We first prove (a). EX, = Stxace—+ XadP + Sisscx.sssq XndP + 
Stx.>s+e he dP. Thus 


EX, S (8 — 8)P(X, < 8 — 4) + (6 + «)(1 — PIX, < 6B — 4)) 


+ | X.dP < (6+) —- (8+ 0PIX. <6 -3) 
[X_>8+<«) 


+ i X, dP, 
[X,>8+«) 





454 A. J. THOMASIAN 


so that we need only show that fix,.s4. X.dP < (log r)p — 1/n p log p where 
p = P(X, > 6 + «. Now we recall, Khinchin [3], p. 4, that if p, > 0 and 
par pi = p, then 
k 
Di Di 
_ — log — S log k, 
~ Pp Pp 


i=l 


so that 


k 
— 2X pilog p: < p log k — p log p. 


Since A‘ has r" points we see that fix,5s4., X.dP S 1/n{[p log r” — p log p| 
and the proof of part (a) is completed. 


To prove part (b) we start from the same initial decomposition of EX, as in 
part (a) and obtain 


EX, = (8 — 6)(1 — P[X, < 8 — 8) — P[X, > B+ 4) + (8 + ©) PIX. > B+ €| 
(8 — 6) — (8 — 5)P[X, < B — 8) + (6+ €)P[X, > B+ 4, 


so that part (b) is proved. 

Tueorem 2. /f for some B we have X,, 4 B then EX, — B. 

Proor. Immediate from Lemma 3. 

The next result in a sense permits us to eliminate half of our task whenever we 
try to prove that X, — EX, F,0. 

Lemma 4. P|X, Ss EX, + «| — 1 for all « > 0 if and only of 


P(X, s EX,- 4-90 
for alle > 0. 
Proor. Immediate from Lemma 3 when §£ is replaced by EX, . 
TueoreM 3. If EX, converges to some number B and any one of 
PIX, S$ 8+46—1, PIX, SB - 4-0, 
D,(B + «) > 1, D,(8 — 6) -0 
is true for all « > 0, then X, 48. 


Proor. Immediate from Lemmas 2 and 4. 


4. Proof of the AEP. Henceforth we will consider only stationary sources. Thus 
we assume, for all k 2 1 and for all (y,, --- , yx) ¢ A™, that 


PUY ja = "1 a #HS Y js ian yel 


is independent of j 2 0. For any m = 2, ]¢A‘™”’, je A we mean by (/, j) 
that element of A‘” whose first m — 1 coordinates agree with J and whose last 
coordinate is j; and we define g; = P(]) and q;; = PU,j)/P(1) for P(/) > 0. 
Let HH, = ->> 4191; log qr; , Where the sum is over all (J, 7) with gq, > 0. 





AEP OF INFORMATION THEORY 455 


Clearly H,, 2 0 and it is well known, Khinchin [3}, p. 6, that H,.. S log r is non- 
increasing so that 


; id ; 
EX, = | 2) Hy) > H = lim Hn. 


If X, converges in probability to some constant, then we know from Theorem 
2 that this constant must be H. 

For a given m 2 2, and J¢ A‘”””, je A, n we define the random variable 
Ni; = Nij(\, ye, °** ) a8 the number of integers i with 1 Ss i Ss n — m+ 1 
such that (yi; , Yiar,°°*» Yiema) = (1, 7). We will call a stationary source 
ergodic if for all m, J, j 


n / P 
Nij/n = q19 4) - 


This definition of ergodic is intuitively appealing and it is precisely this property 
which we will use in our proof of the AEP. It is easy to show, Khinchin [3], p. 49, 
that our definition of ergodic is equivalent to the usual one. 

Tueorem 4. X, AH for any stationary ergodic source. 

Proor. It is clear from Theorem 3 that it is sufficient to prove that for all 
m, « > 0 we have D,(H,, + «) — 1. We will do this by exhibiting, for every m, 
« > 0, a sequence of sets B, C A‘ with P(B,) — 1 and 


M(B.) s 2‘"=*"" for all large n 


where M(B,,) is the number of elements in B, . 
Let m, « > 0 be given, and for arbitrary 6 > 0 define B, as the set of 
(yw, ¥e,°** ) such that 


Nij(tn, ye, *** ) 


’ —~ rij s 6 for all I,j with Qr91j >0 


Nij(ta, Ye, °*° }=09O for all J,7 with g; = 0 or q1; = 0. 


(n) 


Clearly B, C A” and P(B,) — 1, so that we need only bound M(B,) appro- 
priately, to complete the proof. We now use the q;; , g; to define a new stochastic 
process, with probability distribution Q on A’, which is to be a multiple Markov 
chain. Thus we start our Q process off with q, as the initial distribution and use 
q1; for our transition probabilities. For the Q process, for any n 2 m and any 


(mn, rave + Yn) e A™, 


we have 
Qi. = Yn | Y; = roe **% « Ye = Yn] 
= QU. = Yn | Vans = Ya—-m+i,*** Yn = Yn}; 





456 A. J. THOMASIAN 


that is, the conditional probabilities of future states depend only on the m — 1 
past states. Now for any (jy, ---, yn) € B, we have 


QAw,--+5 yn) = ae TT (ay), 
where g, > 0 and the product is over all (J, 7) with q:qi; > 0. Since 


(mn, ae » Yn) € B, 
we have 


Nij(yi,*** 5 Yn) S (igri + 8)n, 
so that 


Qs °*+ 5 ym) = (ar) *(1D ars)’ I Gu)1" = 2""*°", 


where the last inequality is obtained for 6 small enough and n large enough so 
that (qr)"(I] q:;)’ = 2™*. Under these conditions 


1 > Q(B.) > 2 #=*9"(B,), 
and the proof is completed. 
REFERENCES 
{1} L. Breman, ‘The individual ergodic theorem of information theory,’’ Ann. Math. Stat. 
Vol. 28 (1957), pp. 809-811. 


(2) A. Fernstern, Foundations of Information Theory, McGraw-Hill, New York, 1958. 

{3} A. I. Kaincutn, Mathematical Foundations of Information Theory, Dover Publications, 
1957. 

[4] B. McMiuuan, “‘The basic theorems of information theory,’’ Ann. Math. Stat., Vol. 24 
(1953), pp. 196-219. 

[5] C. E. SHannon, “‘A mathematical theory of communication,” Bell System Tech. J., Vol. 
27 (1948), pp. 379-423, and pp. 623-656. 


, 





PRODUCTS OF RANDOM MATRICES 
By H. FursrenperG anp H. Kesten' 
Princeton University 
1. Introduction. Let X', X’, X*, --- form a stationary stochastic process with 
values in the set of k X k matrices. The asymptotic behavior of the product 
"T’ = X"X”"" .-- X" has been studied by Bellman [1] in certain cases. In par- 
ticular, Bellman showed that, if the X” are independent and have strictly posi- 
tive entries, then, under certain conditions, 


lim n™“Eflog("Y"),.;} 


n~s 


exists, where the subscripts ij refer to the ijth entry of the matrix. In addition, 
Bellman conjectured that 


n “(log (°Y')s.5 — E\log (*Y')<.4) 


is asymptotically normally distributed. These assertions are motivated by con- 
sidering the case where the matrices in the range of the X”" commute so that 
they may be simultaneously diagonalised. 

We shall arrive at Bellman’s result by considering first the behavior of the 
norms || "Y’ ||. We find that, under very general conditions, the limit of n™ 
log || "Y' || exists almost everywhere, as well as lim,... n” E{log || "Y’ ||}. Under 


certain positivity assumptions on the entries of the possible matrices, the 
asymptotic behavior of the ("Y"),,,; is deduciblefrom that of || "Y’ ||, and this will 
enable us to strengthen Bellman’s result to an almost everywhere statement. 
The conjecture regarding normality will be proven in certain cases and we 
shall give examples to show that the possibilities of further extension are 
limited. 


2. Asymptotic behavior of the norm. For a k X k matrix A with real or complex 
entries we define the norm of A by || A || = max; >-,| A,,;|. If B is another 
k X k matrix we have 


(2.1) AB|| s || A! || Bi. 


This simple fact gives us: 


Tuerorem 1: Jf X’, X’, X*, --- form a stationary stochastic process with values in 
the set of k & k matrices, then 


(2.2) lim n“Eflog || X"X"” --- X' ||} = B 
n~oe 
Received October 12, 1959. 


1 Work supported by the U. 8. Army Ordnance, Office of Ordnance Research, Contract 
DA 36-034-OR D-2001. 


457 





458 H. FURSTENBERG AND H. KESTEN 


exists (E is not necessarily finite). If, in addition, the X-process is metrically 
transitive and E\log* || X" ||} < «,* then 
(2.3) lim sup n“ log || X"X""' --- X'|| S E 
with probability 1. 
Proor: Set 


(2.4) teat Ps ee oe 
Then, by (2.1), 
log | ate ee x On log atays { < log | aTeyers + log | ays ll, 


and, since the process is stationary, Eflog || "*"¥Y' ||} < Eflog || "¥' ||} + 
Ejlog || "Y* |\}. This, however, is known (([4], p. 98) to imply (2.2). 
Under the assumption that E{log™ || X’ ||} < «, it follows that 
>> Ptlog* || X* || = en} < @, for alle > 0 
n=l 
and therefore lim,...n log* || X* || = 0, and so lim sup,.. n° log || X" || < 0, 
with probability one. Hence in order to prove (2.3) it suffices to show that, for 
each « > 0, 


(2.5) lim sup nN“ log || "*¥' || Ss E+. 
for some N. 

By (2.2), given any « > 0, we can find an N such that N'Ejlog || “Y' ||} 
s E + « (if E is finite; E = @ is excluded by hypothesis, and if FE = —= 
only minor modifications need be made). If the process is metrically transitive, 
then, by the strong law of large numbers, 


r—l 
(2.6) lim r*N* >> log || “**¥**" || s E+. 
re k=O 
If r has the form nN, then, writing the sum in (2.6) as 
> > log t| (t+1) N+e-1 yore | 


el t=O 


and using (2.1), we deduce 


(2.7) lim sup r“N~“(log || "YY" || + log || "***¥* || + --- + log , “7?” *¥™ |) 


Ss E+. 
eNtjyitt { 
il 


Since < | ON Ts er tt 1} | yn | {| emtyiti | and 


lim sup n™ log || X" || < 0, 


*log* ¢ = max (log ¢, 0). 





PRODUCTS OF RANDOM MATRICES 459 


we find that we may replace each || ""*’¥’* || in (2.7) by || "*¥” ||. This gives 
lim sup (nN)* log || “Y" || S$ E+ 


from which (2.5) follows. This proves the theorem. 
We would like to strengthen (2.3) to read 
(2.8) lim n™ log || "Y* || = Z. 
That this may be done will be seen in the next theorem. For the proof we shall 
require an auxiliary process which we proceed to define.’ 


Let © be the set of all sequences {(z, , 2), (22, 22), --*} where the z, are 
matrices in the range of the X” and the z, are matrices satisfying 


(2.9) } XntiZan WZn+s = XniiZa ’ | Zn I = Oorl. 


The variables X”, Z” are now defined on © as the coordinate functions. The 
subset of 2 on which z; = 2z,/|| z, || (with q = 0 if m = O and z,, = 0 if 
In4iZn, = 0) may be taken as the sample space for the X process, since on this 
subset the Z” are functions of the X". Consequently we may define a measure 
wi, on & by carrying over to this subset the given probability measure of the X 
process. Let T be the shift operator on 2:7T{(z,, 2,)} = {(2n41, 2n41)}. Note 
that since z, need not equal z2/|| 2 || when z, = 2;/|| 2 ||, the subset considered 
above will generally not be invariant under T. Hence the measure y, will gener- 
ally not be an invariant measure. Now define the measures ~, on 2 by 


we(Q’) = w( T**'0’) 
for 2’ C Q. We then have 

Lemma 1: Let», = n* >is we . There exists a subsequence v,, converging weakly 
to a probability measure uw on Q in the sense that the finite dimensional joint dis- 
tribution functions of the variables X", Z” with respect to the v,, converge to the 
corresponding distribution functions of the X", Z” with respect to » at each con- 
tinuity point of the latter. The measure y is stationary, t.c. invariant under T, and 
on subsets of 2 defined by the X” alone, yw agrees with the given probability measure 
of the X process. 

Proor: Any sequence of probability distributions on a finite dimensional 
Euclidean space has a convergent subsequence; however, the limiting distribu- 
tion may not be a probability distribution, i.e. it may not assign probability 1 
to the whole space. For the first part of the lemma it will suffice to show that a 
convergent subsequence of the distributions in question must converge to a 
probability distribution. Once this is shown the required subsequence n, is ob- 
tained by a diagonal procedure. Since the X process is stationary we observe 
that yu, all agree with the original X process measure on those subsets of 2 de- 

* A similar idea occurs in a technical report entitled “Electron Levels in a One Di- 


mensional Random Lattice,’ by H. L. Frisch and 8. P. Lloyd of the Bell Telephone Lab- 
oratories. 





460 H. FURSTENBERG AND H. KESTEN 


fined by the X”. Hence the same is true of the », and of any limiting measure of 
the v, . Now, by definition, || Z" || = 0 or 1 on the support of any of the u, , 
and so the range of (Z', Z, --- , Z™) is always compact. Let now {k,} be any 
sequence of integers such that the joint distribution functions of Z,, --- , Zm 
and X,, ---, X» with respect to »%,; converge. The convergent sequence of 
probability distributions is to be taken on the product of a compact space (the 
range of (Z', Z’,---,Z™”)) with a locally compact space (the range of 
(X', --- , X¥™")) and they agree on sets defined by subsets of the locally compact 
space (i.e. sets defined in terms of X’, --- , X"). Moreover, by (2.9), the space 
Q is a closed subset of the product space. Under these circumstances it is easily 
seen that the limiting distribution is a proper probability distribution. Thus the 
limiting measure py exists; that it has the desired properties follows readily. 

We may now prove (2.8). 

Tueorem 2. Jf X', X’, X°, --- form a metrically transitive stationary stochastic 
process and 


E(log* || X’ ||) < @, 
then 


lim n™ log || "Y' || = EB 
with probability 1, where E is defined in Theorem 1. 
Proor: Since log || "Y’ || S Sof: log || X* ||, E <. ©. If E = —~, then the 
theorem is a consequence of (2.3). Hence we may assume that £ is finite, and so 


(2.10) inf nEflog || "Y’ ||} > —o. 


This implies that P{|| "Y* || = 0} = 0 and in particular P{|| X’ || = 0} = 0. 
Consider now the space Q introduced before and the probability measures »,, 
converging to u on 2. The main step will be to prove the relations 


(2.11) E = lim inf [ log | X*Z' || dv, S [ 0 | X*Z" || du < oo. 
2 2 


-o 


By definition of the », , we have 


[ v0 X°Z' || dv, = nz" zs log || X***Z* || du. 
Q Q 


k=l 


On the support of u; we have Z' = X‘/|| X’ ||, and by induction, using (2.9) 
and the fact that || "¥'|| vanishes with probability 0, we find that 
Zz” = "Y'/|| "Y' ||. Hence on the support of mw, || X*"Z* || = || *¥" |//\| *¥"| 
so that 


[ tog | X°Z' || dv,, = nz" [ (log |)"***¥* || — log || X’ ||) du 
“a Q 


=n; Eflog |\"*"'Y"||} — nz* Eflog || X* |\}. 





PRODUCTS OF RANDOM MATRICES 461 


The first equality of (2.11) now follows from the definition of E and from the 
fact that E{log || X’ ||} is finite by (2.10) and the hypothesis of the theorem. 
The first inequality of (2.11) is derived by showing that log* || X°Z' || is 
uniformly integrable with respect to the »,,. In fact, since log || X°Z’ || 
< log || X |! + log |! Z' || = log || X* || when | Z’ || = 1, we have for A 2 0, 


log || X*Z' || dv, = nz 3 / log |) X°Z" || dus 
log! |X2z1|} 24 wnt veel i821|1 24 


(2.12) ‘ / log || X* || dus 


1 
log||X*z"||24 


<n > / log || X* || dux = / log || X* || du. 
_ log||X?|| 24 jog||X'|| 24 

By hypothesis, f log* || X' || du, < ©, so that the last integral in (2.12) tends 

to zero as A — ~, But, 


frog x'2' du = ff tog XZ" du + ff tog X*2" | dy, 
log|X?z14 20 logiX?z1450 


and similarly when y is replaced by »,,. By the uniform integrability of 
log* || X°Z" ||, 


lim / log || X*Z' || dv, = / log || X*Z" || du. 
ie log|X?z1\) 20 logiX?z1y20 

Since the following integrals have an integrand bounded from above and »,, — yu 
one obtains 


lim inf / log ||X°Z’|| dvn, S / log || X°Z"|| du. 
"ae log| |X*z1|[ 50 log||X#z1\\ <0 
This proves the first inequality of (2.11). The remaining inequality is a conse- 
quence of E log* || X' || < @. 
Our theorem now follows easily. For, since u is a stationary measure on 2 and 
log || X°Z' || is integrable with respect to this measure, we may apply the ergodic 
theorem ((3] p. 465) to find that 


lim n* > log || X*"2Z* || =y 
k=l 


non 


exists almost everywhere and foy du = fo log || X°Z'|| du 2 E. Now by 
(2.11) and the assumption that EF > —~, || *¥°Z'|| = || X°Z'|| = 0 almost 
everywhere on 2. Then Z’ = *¥°Z'/\\ *¥°Z' || a.e. and X*Z’ = *Y*Z'/\| *Y°Z" ||. 
Since, by (2.11), || X°Z’|| + 0 ae., it follows that *Y’Z' = 0 ae. and so 
Z = *Y’Z'/\| *Y°Z' || ae. and X‘Z’ = *Y’Z'/|| *Y*Z' ||. Continuing in this man- 





462 H. FURSTENBERG AND H. KESTEN 


ner we find for all k, || *Y*Z' || + 0 a.e. and X**'Z* = ***y*z'/|| *Y°Z' || a.e. As 
a result, || X**'Z* || = || **y*Z' |\/|| *¥*Z" ||, and we have 
lim n™ log || ***Y°Z' || = y ae. 


Hence 
(2.13) lim inf n™ log || "**Y* || = y ae. 


with respect to wu (and hence with respect to u,). However, by Theorem 1, the 
left side of (2.13) is a.e. S E, and since the integral of the right hand side of 
(2.13) is greater than or equal to EZ we find that the left hand side is equal to 
E a.e. This completes the proof. 


3. Asymptotic behavior of the entries. In this section we always make the 
following assumption: A I: The possible matrix values M for X' all satisfy 


(3.1) M;,; > 9, 
and 


(3.2) 1 S (max M,,;)/(min Mi;) SC <o@. 
3 2 


Lemma 2. Jf A I is satisfied, then 
(3.3) (7°*Y) an > 0, 
and 
(3.4) CT aT and 3 C. 


Proor. (3.3) is obvious and so is (3.4) ifn = 0. If n 2 2, we have 
‘ow a 5 Le ean er ee Peake 


ra 


Pr tan oe we a lle 





and this is < C” by (3.2). A similar estimate holds for n = 1. This completes 
the proof of the lemma. 
We remark that with A I, quotients of the form 


Cor 

Pra ck 
are of the same order as (X"*”*"),, in the sense that the ratio of the two is 
bounded away from zero and infinity. 

Coroutiary: If X', X*, --- is a stationary stochastic process satisfying A I, 
then lim,.«. n Eflog ("Y"),,;} exists and is the same for all i and j, say E. If, in 
addition, the X-process satisfies the conditions of Theorem 2, then for all i and j, 

lim n™ log ("Y")«,5 = E 


new 


with probability 1. 





PRODUCTS OF RANDOM MATRICES 


Proor: (3.3) and (3.4) imply 
(35) min CY"). & |I"¥'l| Sk max (*Y"),5 S kC* min (*¥"),,5 . 
+2 4, ‘ 
Theorems | and 2 now give the required result. 


The first part of this corollary is a generalization of the result in [1]. 

Lemma 3. If we define ("*"Y"); = Do; ("*"¥™);,; and if A I is satisfied, then 
nr, (Me aaa 
(*+*Y*) ("+= Y™)., = G1 Cc ) = 

Proor: A straightforward computation shows that 

er as es re ae 

(mtrti Ym) (mertiYm) 


a >{ (xr inal YD. (xr ae Y De | ee 
‘ (#+rY*) 


(3.6) 


(mtr Yo) és (merti 1ymtr).. 
But, for all ¢, 
r! (x), AY) 


(3.7) 


all the summands in (3.7) are positive and by (3.4), 
(yer a Xe co (xe 6, ata ). 


2 (mtrtt Y*)i, (ores y*),, 


(3.8) 2 
We proceed now as in [3], pp. 173-174. let S’ be the set of indices s for which 
the summand in the right hand side of (3.6) is positive and S” the set for which 
the summand is negative. By (3.3), (3.7) and (3.8) 


0< 


> a te SoC Fh te ee el 
= (merely se CONT da f 


f Ee Pine (wry me _ Sa i ‘ (oe ml 


eck” \ era are ae (REY )y 


s(1-C*). 


Splitting the sum in (3.6) up into a sum over S’ and one over S” one obtains 
from (3.6)-(3.9), 
(= +1yym ds, ; eer ae 


max" Caraiye), — MIN“ Corratpmy, S (1 — *){ mas 


(PY) 55 

Cw ae 

(*"Y™)s5 | 
Crr aay 
The proof is now completed by induction on r (compare [3], pp. 173-174). 


— min - 





464 H. FURSTENBERG AND H. KESTEN 


Consider again the (X, Z) process defined by the stationary measure u on 2 
as in Lemma 1. Put 


(X*Z') (ew 
(3.10) a= [ 106 “ZI; 4 <* d w= | log (ZF); 3 


_ (xz da (X°Z aa 
c= ff (toe Wha Now Bf m Na 


ese lS Y( -* )s : 
== / ( aI), — a) log “ary o «) du, 


(3.11) 


and 
(3.12) b=a+2 > ¢. 
r=2 


(The convergence of the series in (3.12) will follow from the proof of Theorem 3.) 

In order to prove the asymptotic normality of log ("Y'):, we introduce the 
following “independence assumption” A II: If 2’ is a measurable set in the sample 
space of the X-process, defined in terms of X"*"**, X"*"**, --- only, then 
| P{a'| X,, «+», Xm} — P{Q'}| < DvtP{a'}, where D, and \, are some fixed 
positive constants and \; < 1. 

Note that A II is satisfied if the X‘ are mutually independent or if the X- 
process is an aperiodic Markov chain with finitely many states (i.e. X’ can only 
take finitely many values) with one ergodic class ([3] Ch. V, §2). 

In addition to A I and A II we shall need a condition regarding the moments 
of log (X"):,. We then have 

Tueorem 3. Jf A I and A II are satisfied, and if 


(3.13) E | log (X'):a ?** < & 
for some 6 > 0, and if a and b are given by (3.10) and (3.12), then 
(3.14) lim p{eecr Ming — MO x} = (Qe) [ et at 


one (nb yi? 
when b # 0. If b = 0, then 
(3.15) (log ("¥");.; — na)/(n)"” — 0 in probability. 


Remark. Since, by (3.4), | log ("¥')i,.;, — log ("¥')iz.s. | S 2 log C, it suf- 
fices to prove the result for log ("Y'):, . This theorem will then give the joint 
limiting distribution of all log ("Y");,;. 

Moreover, (3.5) shows that (3.14) (or (3.15)) also holds for log || "Y’ || 
instead of log (°F* a2 : 

Proor. As remarked, we can take i = j = 1. Then log ("¥"):3 = Doha & 
where 


& = log ((*¥')ral/C CY )aal(k > 1) and & = log ('¥"):4. 





PRODUCTS OF RANDOM MATRICES 465 


We shall show that Bernstein’s central limit theorem [2] dealing with “almost 
independent” random variables is applicable. Strictly speaking, Bernstein’s 
theorem would require E | log (X'),; |" < ©. We shall therefore follow the 
treatment given by Doob in [3], Ch. V, §7 for the special case of Markov chains. 
This will show that our conditions are sufficient. We may and will choose 6 
fixed and 0 < 6 < 1 such that (3.13) is satisfied. 

The heart of the proof is the expansion (cf [3] p. 38): 


( min+k 


E< exp ity > (&—a)|X,--: x. } 


s=uemen+l 


m+n+k 


=l+itty > E {&, — a| X1,°-* Xn} 


e=mm+n+l 


min+k 2| ) 
— (oy)*/2)8{ ( pb a) |Xiy +++ Xm 


\ \ewme n+l 
, min+k ! 
+ 0(inie4 | ae Bal |X,---Xa}), 
| omm+n+l | 


We shall show that there exist positive constants D and \ with 0 < A < 1 such 
that 


(3.17) | Et é _ | X; fit 2 X nj | Ss D ar", 


( m+n+k 2 
E (ie ¥ g, ~— a) Xi yews, x.} me Ee 
eom+n+i 


(3.18) 


k | 


k 
—27*S S c-1} S Dr’, 


e=l re+l 


and 
m+k 


(3.19) E{| > (& = a)(?**" xX, we X.! < Dt" ® 


s=m+i 


for 6’ = 0 or 6. The relations (3.16)-—(3.19) replace the Lemmas 7.1-7.4 in [3] 
pp. 222-228, and with their help one can then compute 


Elexp ity 2 (t, — a)} for y = (nb) *? 


and complete the proof along the lines of pp. 228-230 in [3]. 
If A, B, C are k X k matrices, then 


(3.20) (ABC),3/(BC)ia = > Ai.d(BC)ia/( BC); 3) 
= >. A, «( B,/B,) + (> A, .B.> (By, ,/B; oe B,,;/By)Cja\/(>_ By 5), 
1 ‘ 3 3 
where B; = pa , By. Now take* 


4 [xr] denotes the largest integer S$ z. 





H. FURSTENBERG AND H. KESTEN 


A ae yutet 
B . ateyntia(h 


, 


C - m+(n/2) y' 


With these substitutions it follows from Lemma 2 applied to ratios such as 
A; ,;,/A1,i, and B; ;,/B;,;, , and from Lemma 3, that the second term in the right 
hand side of (3.20) is O((X"*"*),,(1 — C*)"*) uniformly in the matrices 
Xi, -°**, Xm4n. Hence, again using Lemma 2 and the identity 


log (a + 8) = loga + log (1 + (8/a) = loga + O(B/a) 
as 8/a — 0, one has, uniformly in X,, --- , Xmain 
(mtegerers } 


(m+n Ytin/2i i 


Emtn+i = loe{ , (er is 
(3.21) ; 


Ou c* n/2 | Peery), or y~3\n/2 
+ OU — Co)" = ls Sayeremy, +O - Cc)". 
This shows, in particular, that (again by Lemma 2) | many: — log (X"*"*"), , | 
is bounded and hence (using A II and (3.13)), E{| Emsna:| 77? | Xi, ---, Xa 
is bounded. 

We also obtain from (3.21) 
(stot patie). | ) 


|Xa,---, Xa? 


( | 
EY Emons | Xi °° Xm} = E{ og 


(mtn y+ {n/2) bi 
(3.22) (ator at ial ):\ 


+0(11-C*)" = Blog (min Yorinial), f 


+ 0(1 — C*)"* + O(at”) 


uniformly in X,, --- , X». The last equality is a consequence of A II because 
the expression between the braces depends only on Xmjnjy, °° , Xminai, and 
the uniform boundedness of 


. ‘tase soedeneed " 
B{ log “(orn Yotinialy, | 


(ef. remark after Lemma 2). Setting 4» = {max (1 — C™*, \,)}'?(< 1) and 
using the stationarity of the X-process, (3.22) can be rewritten as 


(3.23) El Emanai | X1, °° Xm} = E€n—tnijg2 + O(A2) 


uniformly in X,, --- , X. But, by (3.21) and its analogue, derived by putting 
A = X*", B = *Y’, C = Z' in (3.20), one has 
eS Fes Sy SE ch De 

(X* eee X')ia og (X* eee X*), 4 


typ 1 (XP +s BZ") 
+ O11 — C“)* = log gi, 


fer = log 
(3.24) 
+0(1 —c”*)*, 





PRODUCTS OF RANDOM MATRICES 467 


as long as (Z'),; > 0. However, by Lemma 2 and the construction of 
u, (Z');; > 0 a.e. with respect to wu. Therefore 


. si rt ++ Xia 
Bisa = J tow :-- Shy 


" er X*2' ia -4)s 
* [toe Tet + O11 — C*) 


= fog FF du + O01 — 0) =a+ Ql — 


(3.22) and (3.25) prove (3.17) for a suitable D and A 2 \. 
As in (3.22) and (3.25) one shows 


E\ (Emsntr — @)(Emsngs — @) | Xi, °-- , Xmf 
(3.26) = E\(En-tnjar — @)(En-tniigs — @)} + OAT) = c + OCT) 
uniformly in X,, --- , Xm. At the same time, using (3.17) 

E\ (Emsnsr — @)(Emensis) — @)| Xi, --+, Xl 
(3.27) = El (Emsng: — @)E | (Emsnge — @) | Xa ++ Xegnga} Xa - ++ Xu} = O(A3) 


uniformly in X,, --- , Xm. Hence also c, = O(d:) so that the series in (3.12) 
converges. (3.18) can now be proved quite easily, as 


m+n+k min+k 
oe fa) |X 1+ Kq b= ie > E(t —a)*|Xi---Xe} 


smm+n+l sommen+l 


mint+k mink 


+2k" Do x B{(t, — a)(t — a)|Xi--- Xn}, 


some nt+l ree+l 


E\(& — a)’|Xi--- Xa} =e +002), 
and 


m+n+k e+u 


yi E{(&, — a)(& —a)|Xy---Xe} = Do Cross + WAT”) + OCP). 
Taking u = min (m + n + k — 8, 8 — m) we obtain (3.18) fora \ < 1 satis- 
fying nz S D.d" for some D, and all positive n. (3.18) of course implies (3.19) 
for 6’ = 0. The proof of (3.19) can be completed as in Lemma (7.4) p. 225 in 
[3]. In fact, if 
m+k+n m+k+2n 


uu. = ea (&—a) and », = ie. (g.-—a), 

then 

Ej | un + vn | 774 | Xy- ++ Xu} S Ef (ten + 0n)*(| te |? + fon |) | Xa --- Xel 
S E{j un |? | Xi--- Xn} + Eilon |? | Xi --- Xe} 


+ 2E{ju.|'* |.) |Xi--- Xa} + 2E{| un] |v. |°" |X, ---, Xa}. 





468 H. FURSTENBERG AND H. KESTEN 


However, E{| un |*** |v, | | Xi --+ Xm 
S E{| un | E{| vn | | Xi +++ Xmpesn} | Xa, +++, Xu} 
E{| us | (Ef| on |? | Xi +++ Xmseen})? | Xi, --+, Xu} 
Dn" (E} *|Z,--- x4) 
Dn'?(Dn)“*?? ie D824, 20/2 
and similarly for E{| u, || v,|'**| X,--- Xn}. Hence 


E{ | Un + Un ‘ X,,° ++, Xm} 


3.28) mtkin ies 

< 2sup Bi >,  @—a)]"”|x,---X.) +20. 
k>0 smm+k+1 

The inequality (3.28) replaces (7.11), p. 226, in [3] if c, in [3] is replaced by 


( > > 
sup HY) 2 (&, — a) |?" |X wat 
k20 s=m+k+1 } 
The remainder of the proof of (3.19) can be copied from pp. 226-227 of [3] and 
the theorem now follows from (3.16)—(3.19) as indicated before. 


4. Two examples. The first example shows that one cannot prove the corol- 
lary to Theorem 2 without some positivity assumption as in (3.1), even though 
Theorems 1 and 2 show that the corresponding result for the norm is true with- 
out AI, 

Examp_e 1. Take the X' mutually independent with the distribution 


Pix ( g}ns 
pi x= (? )} =. 


\ 
ky 


In this case "Y' is sometimes of the form (( n) and sometimes of the form 


0 2 
2"* 0 
n'E log ("Y’)14 ° 
Since log || "Y* || behaves slightly better than log ("Y").1 , as indicated by the 
last example, one might hope that the central limit theorem would hold at least 
for log || “Y* || without AI. The following example shows that this is also false. 
Examp.Le 2. Take the X* mutually independent with the same distribution. 


All X’ are of the form ( . ) where \; and , are independent strictly positive 


1 
(° v4 ) where 0 < lim kj/n < 1. Hence n™ log ("Y'), 1 has no limit, nor has 


0 A 
random variables with the same distribution, and E log \; = 0, E(log\,)* = 

Then ("¥’):2 = ("¥')ea = 0 while n™” log ("Y¥')1. and n™” log ("Y')2.2 are 
independent random variables, both asymptotically normal, with zero mean and 





PRODUCTS OF RANDOM MATRICES 469 


unit variance. Since n~'” log || "Y' |! is the maximum of these two variables it 
is not asymptotically normal, nor is n™**(log || "Y" || — nd) for any d. 


REFERENCES 

{1} Ricnarp Be.iman, “Limit theorems for non-commutative operations I,’’ Duke Math. 
J., Vol. 21 (1954), pp. 491-500. 

(2) Sence Bernstein, “Sur l’extension du théoréme limite du calcul des probabilités 
aux sommes de quantités dépendantes,’’ Math. Ann. Vol. 97 (1927), pp. 1-59. 

[3] J. L. Doon, Stochastic Processes, John Wiley and Sons, New York, 1953. 

[4] G. Pétya unp G. Szecd, Aufgaben und Lehrsdize ans der Analysis, Aufgabe I, 2te 
Auflage, Springer-Verlag, Berlin, 1954. 





AN EXPONENTIAL BOUND FOR FUNCTIONS OF A 
MARKOV CHAIN’ 


By Metvin Katz’ Jr. anp A. J. THomasian 
University of California, Berkeley 


1. Introduction. An explicit and relatively simple exponential bound is obtained 
for P}\ a2 nt f(X,) — uw | 2 efor some n 2 mj, where X,, X2, --- isa finite 
state ergodic Markov chain with arbitrary initial distribution, f is any real-valued 
function, and y is the expected value of f( X,) computed under the unique initial 
stationary measure. Bounds for the one-sided inequalities are also given. Be- 
cause the assumptions are weak and permit the transition matrix to contain 
zeroes, the result can be applied to multiple Markov chains (Doob [4], p. 185) 
and thus to sums of the form S, = >>f; f(X:, Xesi, Xea2). The proof employs 
methods of recurrent event theory that have been used by Chung [2], and Doblin 
[3]. Asymptotic results for the one-sided inequalities have been obtained by 
Koopmans {7}, under the restriction that the transition matrix contains no zeroes. 
Some possible applications for such bounds can be seen in Chernoff [1], Khinchin 
{6], and Koopmans [7]. 


2. Notation and summary. Let P = (pi;) be an r X r stationary transition 
matrix with r 2 2 and, using the terminology in Doob [4], assume that there is 
only one ergodic class of states E C R = {1,2,--- , rj. Let T C R be the (pos- 
sibly empty ) class of transient states and let p, , --- , p, be the stationary distri- 
bution for P. We denote the smallest positive element of P by p. Let X; X2,--- 
be the Markov chain determined by P and an arbitrary initial distribution for X, ; 
so that X, = 7 if the process is in state j at time n. Now let f be a real-valued 
function on R and let S, = Dohif(X:), » = Doin mf(k), and M = 
maxier f( kK) — min jer f(j). 

The notation and assumptions which have been made will be used throughout 
the paper except for the countable state space example at the end. We can now 
state the following 

Tueorem. Let m be a positive integer and let « > 0. Then 


P{\n"S, —w| 2 forsome n ms 2Ae°"™ 


P{S, = n(u+e) forsome n2Z m} s Ae”™” 


P\S, S n(u— ¢) forsome n2m} <s oo 


Received October 8, 1959; revised January 6, 1960. 

! This paper was prepared with the partial support of the Office of Naval Research 
(Nonr-222-53). This paper in whole or in part may be reproduced for any purpose of the 
United States Government. 

? Now at the University of Chicago. 





EXP BOUND FOR FUNCTIONS 


Sr il B p” 
Jing ~ 2M*’ 

3. Proof of the theorem. We first introduce some additional notation. For 
ie E let Vis = 0 and for k = 1, 2, --- let Vi be the time to the kth occurrence 


of state i. Thus Vi = | if X; = 7 and exactly k of X,, --- , X, are equal to i. 
The V; are defined with probability one. Now for k = 1, 2, --- we let 


vi . v, 
b= Vi-Vin, w= DL fX), Ua Lu = 2D f(x). 
j=(Vi_ +1 s=k j=l 
It is well known that vj , »2, --- are independent and »; , vi, --- are identically 
distributed. Similarly, we have uj , u;, --+ are independent and u; , uj, --- are 
identically distributed. 

The basic idea behind the proof is to express the event [S, 2 n(u + «€) for 
some n = m| as a subset of a larger event and for the larger event employ the 
properties of the random variables U; and V; in order to obtain a bound on the 
probability of its occurrence. Once this bound is obtained the other two follow 
quite readily and the proof is complete. To obtain the initial result it is necessary 
to prove the following lemma. The bound and the method of proof of the lemma 
are similar to 8S. Bernstein’s inequality, Uspensky [9]. 

Lemma. [fie E,M s 1/r, —4 S uw < 0,8 2 O then 


yn 4 But (6+0—1) 
P\U, 2 = 6} s — e€ - 
a 


where 8 = p”/2’. 

Proor. The state i ¢ E will be arbitrary but fixed throughout the proof of the 
lemma so that we will not exhibit it in », , u, and U; . We take ¢t > 0 and apply 
a known inequality (Loeve [8], p. 158, (1)) to obtain 


P\U, 2 —pd} s Bet"? = ce Ee™( Ee)". 


Now let a = 1 — p’ so that 0 S a < 1 and for small enough t > 0 we have 
ae’ < 1. It is known (e.g., Feller [5], p. 378) that for k = 0, 1, --- (defining 
0° = 0) Pin, > kr} < a’. Since the distribution of X; is arbitrary it is clear that 
the same inequality applies to v:. We now bound Ee™' 


k=l 


Ee“ = > Ele |», = k)P(y, = k) Ss Doc “’*P(v, = k). 
kel 


min jer f(j) < 0. Thus maxjexf(j) < M S 1/r and therefore e“' < e’”” 
Further 


The inequality follows from the fact that u,; S  maxjer f(j) and, since » < 0; 


a 


. 2r 
> et! P(r =k) = De” Pio, =k) + DY ec Ply =k) +--- 


k=l k=l ker+1 





472 MELVIN KATZ JR. AND A. J. THOMASIAN 


Therefore Ee' < e'/(1 — ae‘). Taking a finite Taylor’s expansion of Ee'“*, we 
get 


2 
Ee“? =1+tEum+ 4 E(w)’ e' 0<t' <t. 


Using the same method as was used on Ee'', we obtain 


E(us)* e"“* = >> El(us)* e' “* | v2 = kIP (v2 = k) 
kel 


~o k 2 
<5 (Ae pen =) 


kewl 
tO yt yk t 1 + ae’ 
s e 2, W(e'a) =e (1 — ae’)*” 
We now let { = —2y8 so that t < p'/8 < } and therefore e' < 2. Thus 


r 


‘ r 2r 
S$ 1+2and ae’ = (1—p)e'si~ P47 51-2. 


Hence ae‘ < 1 and 1 — ae‘ = p’/2. Using these facts we see that Ee'“' < 4/p’ 


and E(w)*e “* < 1/28. Now from Chung [2] we have Eu, = wEr: S wp < 080 
that 


“e tu +( 42/48) avfin® 
Ee’? < e™ /48 ne” 


and clearly 


—9y2 
eM ae *™™ <e 
So the proof of the lemma is complete. 


We turn now to the proof of the theorem and note that [S, 2 n(u + e) for 
some n 2 mJ is contained in 


/U[X,¢ T\\u!U U [S, = n(ut+e), X, =7] 
(nam } (sek nom 


} 
/ 


so that 


P{S, = n(u +e) forsome n= mj s PiU [X,€ T]} 


2m 


+r max P{U[S, = n(u + ¢), X, = i}. 
ick az2m 
Now clearly 
U[S, = n(ut+),X, =i) = U(US 2 Vin + 6), Viz m) 


nom sai a 


so that forice FE 





EXP BOUND FOR FUNCTIONS 
\ 
s=l 


<>P Ui=Vi(w + ‘) atm 


emi 2 


5 EM gy [Ui- vi (4+ J] aie 


az 
<> aaa Blei2mr)timte—t) 1e"™ 
1p 


or ’ 


P{ UIs, 2n(ut+e), X, = i)\ \s DP{u: —Vilut+e) 20,5 5 Vez 


2”) 
\ 


where for the last inequality we applied the lemma with f(k) replaced by 


f(k) — (u+¢/2) 


g{k) = Mr 


and used 


€ l 

aMr =~ 7’ 

where « <= M is assumed, since the theorem istrivial for e > M.Since (A/2) > 1 
we need only show that 


s ; | “ 
max g(k) — ming(j)=-, Di glk)m=- 
keR jer r k=l 


P| U(X, « T}} eer 
nazm 
in order to obtain the bound for P{S, = n(u + ¢) for some n 2 ml}. 
Now we note that for ie E 
> U(X, 7 < Plvi > m|} 


Aes 
and, recalling that P{v; > kr} < a* = (1 — p’)*, we obtain 
SU [X,, e 7’ —p pares. | 


| ies 
Now if « S M and (m/r) s 2° we see that Ae —_ = l6e ae > tis? > l, 
making a theorem trivial in this case. Thus we may assume that (m/r) > 2° 
so that (m/r)(1 — 2°) 2-1 and so 
m m 


2 
4 -lzze 5; BM'm 


Thus forie E 
P! U [X,, € T| < (1 re rn < é p’((m/r)—1) < e outa 
{nzm 
and we have the bound for P{S, = n(u + e) for some n 2 mi}. 
The other two inequalities of the theorem now follow immediately. We de- 
fine the function g(k) = —f(k) for k = 1,---, r. Then —yp = > ies pig(k) 
and ~ 





MELVIN KATZ JR. AND A. J. THOMASIAN 


7 


{e f(X;) S n(u— 6) forsome n2 m} = 


=1 
P{y g(X;) 2 n(—wh +e) forsome n2 a s is? 
j=l } 


For the remaining inequality we have that P{|n'S, — u| 2 € for some 
n2mjs P{S, Ss n(u — €) for some n 2 mj + PIS, 2 n(w + €) for 
some n = mj} s 2Ae**". This completes the proof of the theorem. 

We close with an example of a Markov chain with a countable state space 
for which there exists no constants A and B such that P|S, 2 n(u + e€) for 
some n = mj} s Ae *™"™. Let the state space be R = {1, 2,---} and forjeR 
let pry = c/j’ where » oe me =c'. For j > 1 let p;j;. = 1 and pe = 0 for 
k # j — 1. Then this matrix P = (p,;) admits a unique stationary distribu- 
tion pi, Po, °*: (Feller [5]) with each p; > 0, and we shall assume that it is 
our initial distribution. We define f on R by f(1) = 0 and f(j) = 1 forj > 1, 
so that p= dP si)ps = 1 — p, < 1. Just as before, we let S, = Sits xX) 
and we shall show that if a = 1 then lim sup (1/n) log P{S, 2 na} = 0. We 
define a subsequence {m(k): k = 1, 2,---} of integers by 


9 =—9 


mk) =1+2+---+k = k(k + 1)/2. 


For each k we define a sequence of m(k) states by waa) = (1, 2, 1, 3, 2, 1, 4, 

3,2,1,-::,kk—1,---,1). Clearly (1/m(k))Sau, evaluated at wag 

is equal to (m(k) — k)/m(k) which converges to 1. Also P(wag)) = pc "(ki)~* 

so that lim sup (n™') log P(S, 2 na) = lim (1/m(k)) log P(waa)) = 0. 

REFERENCES 

{1] Herman Cuernorr, “A measure of asymptotic efficiency for tests of a hypothesis based 
on the sum of observations,’’ Ann. Math. Stat., Vol. 23 (1952), pp. 493-507. 

(2) K. L. Cuuna, “Contributions to the theory of Markov chains. II,’’ Trans. Amer. Math. 
Soc., Vol. 76, No. 3 (1954), pp. 397-419. 

[3] W. Dosuin, “Sur deux problémes de M. Kolmogoroff concernant les chaines denom- 
brables,’’ Bull. Soc. Math. ¥rance, Vol. 52 (1938), pp. 210-220. 

[4] J. L. Doon, Stochastic Processes, John Wiley and Sons, New York, 1953. 

[5] Witt1am Feuer, An Introduction to Probability Theory and Its Applications, 2nd ed., 
John Wiley and Sons, New York, 1957. 

(6) A. I. Kurncutn, Mathematical Foundations of Information Theory, Dover Pub., New 
York, 1957. 

[7] Lampert Herman Koopmans, ‘‘Asymptotic rate of discrimination for Markov proc- 
esses,’’ unpublished Ph.D. thesis, University of California, Berkeley, 1958. 

[8] Micue. Logeve, Probability Theory, D. Van Nostrand, New York, 1955. 


9] J. V. Uspensxy, Introduction to Mathematical Probability, McGraw-Hill Book Co., New 
York, 1937. 





APPLICATION OF STORAGE THEORY TO QUEUES WITH 
POISSON ARRIVALS 


By N. U. Prasuu 
Karnatak University, Dharwar, India 


0. Summary. This paper is concerned with the waiting time process, W(t), for 
the queueing system in which (1) there is only one counter, (2) the customers 
arrive at random and are served in the order of arrival, and (3) the service time 
distribution has a general form. It is observed that the Pollaczek-Khintchine 
formula for the transform of the limiting distribution of W(t) is similar to the 
one occurring in the theory of continuous time storage processes, and it is in- 
verted by the method used in that theory. Further, W(t) is shown to be a special 
case of the storage process, and known methods and results of the storage theory 
are used to obtain the transition distribution function of W(t). 


1. Introduction. Several analogies of storage processes with those occurring 
in the theory of queues have been pointed out by Smith [19], Gani [4], Prabhu 
[16], and many others; by making use of one of these, Gani and Prabhu [5] ob- 
tained further results in storage theory. It has been remarked, however, that the 
analogy between the two situations is in their mathematical formalisms rather 
than in their physical models. This statement is essentially true if we confine our- 
selves to discrete time storage processes, but there exists an exact analogy if we 


consider continuous time models. The initial attempts to set up such a model 
were by using limiting methods (Moran [14], Gani [3], Downton [2]); this pro- 
cedure has, however, proved cumbersome, and has obscured the essential fea- 
ures of the underlying stochastic process. In some recent work, Gani and Prabhu 
((6}, [7], [8], [9]) have given a systematic treatment of the various problems oc- 
curring in continuous time storage processes. The model they consider is the one 
based on Moran’s [12] discrete time model for the dam, and is specified by the 
following assumptions. 

(a) Let X(t) represent the input during a time interval of length ¢; we assume 
that X(t) is an additive process with stationary increments. Let K(z, t) be the 
cumulative distribution function (c.d.f.) of X(t), so that 


(1.1) K(2z,t) = PriX(t) s z} (Oszr< ~,08t< w); 


it is known that the Laplace transform of X(t) is given by 


(1.2) [ e* dK(z,1) =<¢"%® (R(@) > 0), 


where &(6) is a function of a specified type. 
(b) The release is continuous and occurs at a unit rate except when the store 
is empty. 


Received June 1, 1959; revised December 3, 1959. 
475 





476 N. U. PRABHU 


(c) The store has infinite capacity. 


If we denote by Z(t) the storage at any time ¢, it follows from the above 
assumptions that 


(1.3) Z(t + dt) = Z(t) + dX(t) — min{Z(t) + dX(t), dt} 


for 0 S t < «. Clearly Z(t) is a temporally homogeneous Markov process; 
this process has been studied by Gani and Prabhu in the case of a Poisson input, 
and also in the case of a continuous infinitely divisible input of the Poisson type. 

Next, consider the queueing system in which (a) the customers arrive ‘at 
random’, i.e. the inter-arrival times have the negative exponential distribution 
he dt(O S t < ~);(b) the queue discipline is ‘first come, first served’, 
and (c) there is only one counter and the service time has the distribution 
dB(t) (OS t< «). Let W(t) denote the waiting time of a customer who 
arrives at time ¢ (i.e. the time spent by him in the queue before the commence- 
ment of his service); W(t) > 0 as long as the counter is occupied, but at any 
time that the counter becomes free, W(t) becomes zero and remains zero until 
a customer arrives. It is easily seen that W(t) is a temporally homogeneous 
Markov process; the Laplace transform of its distribution when the queue is 
in ‘statistical equilibrium’ is given by the Pollaczek-Khintchine formula, 


_ (1— )0 
(1.4) ¢*(6) = @—r+ WO)’ 


where ¥(@) is the Laplace transform of the service time distribution dB(t), 
p = —dy(0) is the relative traffic intensity measured in erlangs, and it is 
assumed that p < 1 (Pollaczek [15], Khintchine [11]). Further, let us denote 
the transition d.f. of W(t) by F(z ; z, t), so that 


R(6) 20, 


(1.5) F(z ;2,t) = PriW(t) S z| W(O) = ad}, 


where 0 S F(m;2z,t) Sl for0S2z< ~ and0O st < «. We may some- 
times simplify this notation for the d.f. to F(z, t). The forward Kolmogorov 
equation of the process W(t) is 


0 0 . 
2 F(z, t) — 2 F(z, 1) = — Fle, 0) + | F(z —u,t)dB(u) 


(1.6) 
z = max (0, 2% —?), 


a result due to Takdcs [20]. The purpose of this paper is to show that W(t) is a 
special case of the continuous time storage process described above and to apply 
known methods and results to obtain F(z, t) explicitly from (1.6). For this pur- 
pose we first invert the formula (1.4); this is done in the next section. 


2. Inversion of the Pollaczek-Khintchine formula. This formula, (1.4), implies 
that if p < 1, then the limiting distribution F*(z) = lim:.. F(z, t) exists, 
and the Laplace transform 





APPLICATION OF STORAGE THEORY 


(2.1) o*(6) = f oe” dF*(2) 
o— 


is given by (1.4). Bene’ [1] has inverted this and obtained F*(z) in the form of a 
compound geometric distribution. An alternative method of inversion is the 
one used by Daniels (see the discussion in Kendall [10]) to deal with Moran’s 
formula for the limiting distribution of the dam storage. To apply this method 
let us assume that an analytic extension of ¥(@) to a full neighbourhood of the 
origin in the @-plane exists. We then note that, if p < 1, there exists a real 
—e <0 such that 6 — \ + A¥(6) < 0 for —c < 6 < 0; the formula (1.4) 
is then valid also for —c < @ < 0. For this range of @ we can write 


(2.2) 1/(\— 90) — 0) = [ cP MOM 
However, we have 
(2.3) l e* dK(z,t) =e tp CON 


where dK(z, t) is the compound Poisson distribution, 


ne CAL)” 
e 


n' 


(2.4) dK(z,t)= > 


n= 


dB,(zx), (OSr<~), 


where B,(x) is the n-fold convolution of B(x) with itself, and Bo(x) = 0 if 
x < Oand = 1 if z 2 0. Substituting (2.3) in (2.2) we obtain 


1/(\ — »W(0) — 6) = [ [ e **? dK(z,t) 


~ f. ei e dK(z + t,t) 


= f. e” f. dK(z + t,t) 


/ gan — 


+/ & | aK(t — 2,1) dt. 
z=0+ 


taz 


| dK(t —2z,t)d = > | yy sd dB,(t — z) (z > 0) 


n= ! 


is the constant term in the series 


ft a dBt — 2) = Dea *{y(d — dad)” 
9 


n=O “tmz 
- «? haje > {Hedy 


9 a 


(2.6) 





478 N. U. PRABHU 


where Gia) = ¥(\ — da) is the probability generating function of the number 
of arrivals in a service period of arbitrary duration. Now choose a real a, such 
that 1 < a < ¢ and a real a: such that G(a,) < Gla.) < a (which is possible 
since G(a) is continuous in a). Here ¢ is the real positive root (other than unity ) 
of the equation Gla) = a, and ¢> 1 since G’(1) = —dAW/(0) = p < 1. 
In the annulus a < |a| < a we have |G(a)| < G(a:.) < a < \a\, so 
that |G(a)/a| <1 and the right hand side of (2.6) becomes 


ae OO" (a i Gla) ). 


The constant term in this is given by the formula 


1 e A-ha) 2 _ Onwme 
. ~ da = lim (a — 1) ———-~ = (1 — oy 
2ri a8 nae tae ne Oe eS p) 

since a = | is the only pole of the integrand within the circle | a | < a, . Hence 
we have the result 


(27) | Ri~sne-t-" (p <1,2>0). 


—Z 


Using (2.7), we can simplify (2.5) as 


: I m * a « ; ee 
28) ——~e—4- f. i aK(2 + t,t) dt — 
We now have 


ee 


[ e [1 — F*(z)] dz [~ e* dF*(z) 


/ 9— 


1 l— 
+ Pp 


; i- we from (1.4) 


(1 — p) ks ™ [ake + t,t) dt 


from (2.8). This is true for —c < @ < 0, and hence for all 6. Thus we obtain 


(2.9) F%\s) = 1 — (1 ) | dK(t + z,t) 
0 


as the limiting distribution of the waiting time. 


3. The waiting time W(t) as a storage process. It is possible to demonstrate 
the equivalence of (2.9) and Bene%’ result for the limiting distribution F*(z), 
but the connection between the Pollaczek-Khintchine formula (1.4) and the 
compound Poisson distribution does not seem to have been noticed so far. This 
distribution, however, is of fundamental importance in the theory of queues 
with Poisson arrivals. In fact, if N is the number of customers who arrive during 
the interval (0, t), then their total service time has the distribution dBy(z), 





APPLICATION OF STORAGE THEORY 479 


and, since N is a random variable having the Poisson distribution with mean XM, 
it follows that the total service time of customers arriving during (0, t) has the 
compound Poisson distribution (2.4). This distribution can be considered as 
the ‘service potential’, which is steadily exhausted by the server at unit rate 
per unit time except when it is zero. Viewed in this manner, the waiting time 
for a queue with Poisson arrivals reduces to a special case of the storage process 
described in section 1, where the input X(t) has the distribution (2.4). Clearly, 
X(t) is an additive process with stationary increments, and from (2.3) it is 
seen that its Laplace transform is given by 


(3.1) l eo dK(z,t) = «"*® 


as in (1.2), with 
(3.2) (0) = | (1 — &*) dB(u). 


The particular case of ‘regular’ service time has been considered by Gani [3] 
and Moran {13} in their treatment of certain finite dam models, and the time- 
dependent solution of Takd&es’s equation (1.6) in this case has been obtained 
by Gani and Prabhu [8]. In the case of an arbitrary service time distribution, 
this integro-differential equation is similar to the one obtained by Gani and 
Prabhu [9] for the storage process with continuous infinitely divisible inputs of 
the Poisson type, and can be solved by the method used there. This is done in 
section 4. As a preliminary result however, we require the probability F(z ; 0, ¢) 
of not having to wait (i.e. the probability of finding the counter free) at time t. 
To obtain this, we first find the probability dG(z, t) that the counter becomes 
free for the first time at ¢, given that the waiting time of the initial customer was 
W(0) = z > 0. This is analogous to the probability distribution of the ‘wet 
period’ in a dam. Following Kendall [10] it can be proved quite generally that, 
for an input of the general additive input with the distribution dK(z, t), this 
distribution is given by dG(z,t) = (z/t) dK(t — z, t), and its Laplace transform 
by e "where n(@) satisfies the functional equation 4(@) = 6 + &(@)}, 
6 > 0. Applying these results to the waiting time process W(t), we find that 


(mt)*" 


(3.3) dG(z,t) = Die™nrz 7 ABalt — 2) 
0 . 


This result has been directly proved by Prabhu [17]. Further, we obtain 


(3.4) [ e' dG(z,t) = «"™, 


* ime 


where »(@) satisfies the functional equation 


(3.5) n(@) = 6+ r— Api n(4)}, 6> 0. 


This equation is essentially the same as one considered by Takées [20], who 
proved that it has a unique solution. 





480 N. U. PRABHU 


Now F(0, t) is the probability of finding the counter free at time t, not neces- 
sarily for the first time; by a direct enumeration of the ways in which this can 
happen we obtain 


(36) F(z; 0,t) = [ d@(2, r)F(0;0,t — 7). 


9 


We assert that the solution of the integral equation (3.6) is given by 


‘0 if t < 2% 
(3.7) F(z; 0,t) dt = 


{ pt 
| dG(z, t) dz ift = %. 
\* #0 


To prove this statement, let us multiply the right hand side of (3.6) by dt, 
and substitute (3.7); we obtain 


t—z9 t—{ t—z9 
| dz | dG(z%, r)dGt,t — +r) = | dG(z + ¢, t) dt 
0 Se 0 


t 
= | dG(z,t) dz = F(%;0,t) dt, 
where we have used the fact that the distribution dG(z, t) is additive in the 
parameter z, which is evident from its Laplace transform. We have thus proved 
our assertion, and (3.7) gives the probability of finding the counter free at 
time t. 

The Laplace transform of (3.7) is given by 


t x 2 
| e "F(z; 0, t) dt = | dz i e* dG(z, t) 
(3.8) 20 £0 20 


zon (8) 


oo ’ e 
ais | e 29 (8) d= — 
2 


—— 3. 
7) rom (3.4), 


where 7(@) is given by (3.5). 
The results (3.4) and (3.8) are due to Bene’ [1]; however, the explicit ex- 
pressions for dG(z, t) and F(0, t) have not been given previously. 


4. The transition d.f. of W(t). We are now in a position to solve the integro- 
differentia! equation (1.6) and to obtain the transition df. of F(z, t) for the 
waiting time process W(t). Let us denote the Laplace transform of dF(z, t) 
by ¢( 4, t), so that 


(4.1) o(6, t) - | e* dF(z, t) (R(8) > 0). 


Taking Laplace transforms of both sides of (1.6) with respect to z, we obtain a 
differential equation in ¢(@, t), which readily yields the solution ¢(@, t) = 
e SOM) _ of, F(0,t — roe ** dr as obtained by Takacs [19]. Writing 
this in a slightly different way, we have 





APPLICATION OF STORAGE THEORY 


t 

(4.2) O'9(6,t) = oe BOW _ I F(O, t — r)e 8" dr. 
0 

Now from (3.1) we have 


(4.3) [ e"K(z, t) dz = oe" 


and 

(44) [ a e"K(2 + 2, t) dz = oe 80, 

so that (4.2) yields the relation 

(45) F(z,t) = K(t+2—%,?) -[ Fo, — +) dK(r + 4,1). 


However, the validity of this inversion rests on proving that the right hand side 
of (4.5) vanishes for z < max (0, z — t) or —z > min (0, t — 2). Thus we 
have to show that 


t—z9 


(48) K(t — z — a, t) - | F(0, 4 — +r) dK(r — z, 1), 


(0<z2<t— wu). 
In order to do this, consider the process Z(t) = W(0) + X(t) — t, which is a 
temporally homogeneous Markov process with the transition d_f. 
(4.7) P(m;z,t) = K(t+2z2— am, t). 
By a direct enumeration of the paths f — —z in this process we obtain 


(4.8) dP(¢, —z,t) = [ dG(f¢; 0, r) dP(0; —z, t — r) (¢ > 0), 


where dG({; 0, 7) denotes the probability of the first transition ¢ — 0. Clearly, 
this is the same as the corresponding probability for our process W(t), since 
the first transition ¢ — 0 can be made only through positive values. Thus (4.8) 
can be written as 


t-{ 
ag) MET ED -| dG(t, t — r)k(r — 2,1) 
(0<¢st—~2z), 


where we have written dK(z, t) = k(z, t) dz for convenience. Integrating 
(4.9) over 2 S ¢ St — z we obtain 


t—t6 t—1 
fe i= -f Kr —2,7) [ dG(t,t— +r) dt 


= | Ly F(0,t — r)k(+ —2z,17) dr 


using (3.7). Thus we have proved (4.6); our inversion is therefore valid, and 
(4.5) gives the transition d.f. of the waiting time process W(t). 





482 N. U. PRABHU 


By an argument similar to the one used by Gani and Prabhu [9], it can be 
proved that the limiting distribution F*(z) = lim:..« F(z, t) exists independently 
of zo if p < 1 and is given by (2.9). This confirms the result obtained in section 2. 

In conclusion it may be noted that the integro-differential equation of Takécs 
in the general case where the Poisson parameter A is a function of time has been 
studied by Reich [18], who reduces it to a Volterra equation of the first kind. 
However, for the case \ = constant, we believe that our solution is much more 
straightforward, and that our method is applicable to more general distribu- 
tions of the service potential; this is being investigated. 

REFERENCES 
{1} V. E. Benes, “On queues with Poisson arrivals,’’ Ann. Math. Stat., Vol. 28 (1957), 
pp. 670-677. 
(2) F. Downton, ‘‘A note on Moran’s theory of dams,’’ (Oxford, 2nd Ser.) Quart. J. Math., 
Vol. 8 (1957), pp. 282-286. 
(3) J. Gant, “Some problems in the theory of provisioning and of dams,"’ Biometrika, Vol. 
42 (1955), pp. 179-200. 
{4] J. Gant, ‘‘Problems in the probability theory of storage systems,’’ J. Roy Stat. Soc., 
ser. B, vol. 19 (1957), pp. 181-206. 
[5] J. GAN1 AND N. U. Prasav, “Stationary distributions of the negative exponential type 
for the infinite dam,’’ J. Roy. Stat. Soc., Ser. B, Vol. 19 (1957), pp. 342-351. 
(6) J. Ganr AnD N. U. Prasuv, ‘Continuous time treatment of a storage problem,’’ Nature, 
Vol. 182 (1958), pp. 39-40. 
(7) J. Gant AND N. U. Prasuv, ‘“‘Remarks on the dam with Poisson type inputs,’’ Aust. J. 
Appl. Sci., Vol. 10 (1959), pp. 113-122. 
[8] J. Gant ano N. U. Prasuv, ‘The time-dependent solution for a storage model with 
Poisson input,’’ J. Math. and Mech., Vol. 8 (1959), pp. 653-664. 
{9} J. Gant AND N. U. Prasutv, “‘A storage model for continuous infinitely divisible inputs 
of Poisson type,’’ submitted to Proc. Cam. Phil. Soc. 
{10] Davip G. KENDALL, “Some problems in the theory of dams,’’ J. Roy. Stat. Soc., Ser. B., 
Vol. 19 (1957), pp. 207-212. 
{11] A. Kurntcnine, ““Mathematischekeya teoriya stationarnoi ocherdi,’’ Matem. Sbornik, 
Vol. 39 (1932), pp. 73-84. 
{12} P. A. P. Moran, ‘‘A probability theory of dams and storage systems,’’ Aust. J. Appl. 
Sci., Vol. 5 (1954), pp. 116-124. 
[13] P. A. P. Moran, “‘A probability theory of dams and storage systems: modifications of 
the release rules,’’ Aust. J. Appl. Sci., Vol. 6 (1955), pp. 117-130. 
[14] P. A. P. Moran, “‘A probability theory of dams with a continuous release,”’ (Oxford, 
2nd Ser.) Quart. J. Math., Vol. 7 (1956), pp. 130-137. 
(15) F. Potiaczex, “Uber eine Aufgabe der Wahrscheinlichkeitsrechnung,’’ Math. Zeit., 
Vol. 32 (1930), pp. 64-100; 729-850. 
[16] N. U. Prasuvu, ‘‘Some exact results for the finite dam,’’ Ann. Math. Stat., Vol. 29 
(1958), pp. 1234-1243. 
[17] N. U. Prasuv, “Some results for the queue with Poisson arrivals,”’ to appear in J. Roy. 
Stat. Soc., Ser. B., Vol. 22 (1960). 
[18] Epaar Reicu, “On the integro-differential equation of Takdées,’’ I, Ann. Math. Stat., 
Vol. 29 (1958), pp. 563-570; II, Ann. Math. Stat., Vol. 30 (1959), pp. 143-148. 
{19} Wavrer L. Smitn, “On the distribution of queueing times,’’ Proc. Cam. Phil. Soc., Vol. 
49 (1953), pp. 449-461. 
{20} L. TaxAcs, “‘Investigation of waiting time problems by reduction to Markov processes,”’ 
Acta Math. (Budapest), Vol. 6, pp. 101-129. 





A ONE-SIDED ANALOG OF KOLMOGOROV’S INEQUALITY' 
By Apert W. MARSHALL 
Stanford University 


1. Introduction and summary. It is well known (see e.g. [4] p. 198) that for 
every positive « and every square integrable random variable X with zero ex- 
pectation, P{X = q s E(X*)/{é + E(X’)). In this paper an inequality is ob- 
tained that generalizes this in the same way that Kolmogorov’s inequality 
generalizes Chebyshev’s inequality. The inequality is proved in Section 2 and 
an example is given to show that equality can be achieved. In Section 3 an ex- 
tension to continuous parameter martingales is obtained, and a condition under 
which equality can be achieved is given. 

2. The inequality. 

TuHeoreM 2.1. Let X,, X2,---, X, be random variables with E(X,) = 
E(X; | 150s +++, X\4) = Oae. (i = 2,3, ---,m), and E(X7) = oi < 
(¢ = 1,2,---,n). Then, for every positive «, 


, 


(1) P{max (X,+ X.+---+X,) 24s 8,./(€+8,), wheres, = aoe, 
lsisgn t=l 


Note that, if Y; = > ant X,,1 = 1,2,---,n, then {Y¥;,1 5 1S n} isa 
martingale and E(Y%) = s,. 

Proor. Let F(z) = F(a, 2%2,°°:,2n) = OC = xz, + 8,)°/(é + s,)*, and 
let 
B, = |X, + Xe+-+- +X; < 6 71= 1,2,°°-,k—1, 


Xi+ X2.+---+X%,2 d, c 1, 2, °°, 
Then 


n l n k 2 
1 iP = ay > > . Aes / , . dP 
[ Foo a2 > Meare RO: +4) I 


Be t=) 


2b P(B,) = max (X,+--- + X,) 2}. 
=i lsign 

Since f F(X) dP = s,/(€ + 8,), the proof is complete. Note the similarity of 

this proof to the standard proof of Kolmogorov’s inequality (see e.g. [1] p. 105, 
314 or [3] p. 235, 386). 

To show that equality can be achieved in (1), let % = > a oi, k = 1, 2, 

,n, and let Z = (Z,,Z:, ++ ,Z,) be a random variable having the following 

distributicn: 


Received June 18, 1959; revised October 28, 1959. 
' This research was sponsored by the Office of Naval Research. 


483 





484 ALBERT W. MARSHALL 


P{Z = (¢,0,--- ,0)} = oi/(€ + %) =~, 


P{ z= e* (—of , —02,°°* , —on-1,€ + 81,0, --+ ,0) 


=> ee nee = 
(2 + 8-1)(C + &) ms 
P{Z =€"(—o{, -03, +++, —on)} = €/(6 + 8,). 
It is easily verified by induction on j that 


k = 2,3,--- 


3 
(2) m= 1— ¢/(6 + 8;), j=1,2,-+-,n, 


kml 


so that this is a valid probability distribution. Clearly, E(Z,) 0. It can be 
shown that E(Z;| Z,,--- ,Zj.1) = 0 ae. (by first computing 


E(Z; | Zin ~ aj-1/€) and E(Z; | Zi = —oj_:/€)) 


and that E(Z}) = oj, j = 1, 2,---,n. Thus the random variable Z satisfies 
the conditions of Theorem 2.1; furthermore, equality holds in (1) whenever 
(Xi, °°: ,Xa) = Za. 

Kolmogorov’s inequality has been extended under certain conditions by 
HAjek and Rényi [2] to provide a bound for 


P (max; &' | Xi + --- + Xi) 2 1} (e,>0,i = 1,2, -+-,n), 


and it is natural now to ask what the best upper bound is for 
P {max; @'(Xi+--- +X) 21) 


under the conditions of Theorem 2.1. Unfortunately this bound has no simple 
expression even for small n, and is not easily obtained. It is given here only for 
n = 2. 

THEOERM 2.2. Let X; and X_ be random variables with E(X,) = 0, E( X2\ X;) = 
0 a.e., and E(X7) = of < ~,i = 1, 2. Then if « > Oand & > 0, 


03 + 03(a2/a)" 
(3) P(X: 2 4 of X1+ X:2 4} S$ —— > —_ 
03 + ag/ay 
where a; = 063 + mn, = 1, 2, and m = min (a4, &), m = e. 
Proor. Following the method of Hajek and Rényi [2], we let F(x, 2) = 
eFi(21) + coF3(21 + 22), where 


2 2 2\2 Ss § 
— mi(a2 + a2) a Mi Ge 
i=—47 2 = —<— 

ai (ag +03)’ (a3 + of m)*’ 


2 2 2 
F,(z) = (+2), F(z) = (c+24 22), 
mn m ™ 2 
and we let B, = {X, = mi, B, = {X, <™, X, + X2 = ne} . Since a 2a, > 0, 
it follows that c, 2 0, and, as in the proof of Theorem 2.1, 





“KOLMOGOROV’S INEQUALITY” 


[ F(X,,X») dP = P(B,), 
By 


/ F(X,,X.) dP > [ c. F?(X, + X:) dP = P(B,). 
Bs Bs 


P{|X, 2 e or X, + X, 2 e}. It is straightforward to verify that, upon integrating 
the function F(X, , X:), one obtains the bound given in (3), and this completes 
the proof. 


Equality is achieved in (3) whenever (X, , X:) has the following distribution: 


Thus j F(X, e X2) dP = P(B,) + P(B,) = P{X, = ™ or X; + Xe 2 no| 2 


f 2 
P{(X, ‘ X-) = (m ,0)} =} a, Ps (X, , X2) a (- “1 2) } 
\ nm M/} 
_ mae Ps (x x.) =(-2 - aa)! 3 ni @ 
~~ @ 03 + ad’ \ rane = n’ a i ~ a(n 0} + ad)” 
In this case, P| X, = m or X, + Xz = mj} = P{X, = a or X, + X2 }]} a}. 

Several inequalities follow from (3) simply by a change of variables. The 
corollaries below are given to illustrate the possibilities. 

CoroLuary 2.3. Let X, and Xz» be random variables with E(X,) = a, 
E(X.\ X;) = 6X, + cae. (where b # —1), and Var (X;) = of < ~,i = 1,2. 
Then if « — a > O and [eo — b(a + ab + c)\/\b + 1| > O where 
6 = sign (6 + 1), 


2 22 2 / 42 
ir —— — bay + ai|(b + 1) ae/ay) 
1X2 (3 ZX.) 2 a) $e 
(4) P\|X, 2 a or 6(X,+ X.) 2@} S of a bo? + ((b + 1)*a3/a] 
where a; = oi + mn,i = 1, 2, and m = [oe — 8(a + ab + c))/\b + 1}, 
m = min («& — 4, m). 
Proor. This follows from Theorem 2.2 by making the change of variables 


Xi=Xita, X;=6X,+ (6+1)X,+ab+e, 
a=ata «= a)/b+1| + &a + abd +c) 


and dropping the primes. 
Note that by taking a =,b = c = 0 in this corollary, one obtains Theorem 
99 


Coro tuary 2.4. Let X; and X_ be random variables such that E(X;) = wy, 
Var (X,) = oi < ©,i =1, 2, Cov (X1, X:) = ow # 0, and suppose that the 
regression of X2 on X, is linear. Then, if & — uw, > 0 and (de — oie) /on > O, 
where 6 = sign ow, 

2 ye . 2 2 2 
; . ( — 
(5) P{X: = « of 8X, 2 a) 5 Wee — eu) + etl ensien) 
ai(o{ 03 — O12) + (ag oj2/a1) 
where a; = a} + mn ,t = 1,2,and m = (be — oime) ‘ow, m™ = min (e — wy, M2). 
° - ve - , 
Proor. To obtain (5) from (3), make the change of variables X, = X, + mw, 





486, ALBERT W. MARSHALL 


,/ , > » fe , , , fe e ‘ 
Xe = [o 12 Xx; + X2) o;'] + 2,65 = & + Mi and é = 6( €2032 + 71 Me) In (3), 
and then remove the primes. 


3. An extension to continuous parameter martingales. We begin by assuming 
that the underlying probability space is such that P is complete. Then we have 
the following: 

TuHeoremM 3.1. Jf |Y¥,, t 2 O} is a separable martingale with E(Y,) = 0 and 
E(Y7) = o'(t) < = for allt = O, then, for every positive ¢ and r, 


) *(r) 
(6) P< sup Y 5 a aie 
{ ae (etre é + or) 
Proor. LetO = 4 StS --: St, = 7.Since X, = Y,, and X; = Yi, — Y;, 
t= 2,3, +--+, satisfy the conditions of Theorem 2.1, 


(7) P{} max Y,,2 «} Ss a (r)/le + o'(r)). 
Iisisn 
Let S be a countable set satisfying the definition of separability and containing 


the points 0 and r. Taking the supremum of the left side of (7) over all finite 
subsets of S / (0, 7], we obtain 


P{ sup Y,2 So (r)/lé + o'(r)). 


te8t (0.7) 


rt ee ge P{ sup Y,2 4, 
te8fV(0,r) te(0,r) 
and the proof is complete. 
THeoreM 3.2. Equality can be achieved in (6) if o°(-) is right continuous. 
Proor. In order to define a martingale that achieves equality in (6), let 
Q = {—1} U {0, ~), ® be the Borel subsets of 2, and let P be the probability 
measure defined on @ by 


P(B) =e {e + lim o (x) }}xen;-1) + w(BN 0, ~)), 
where xg is the characteristic function of the set E and yu is the measure induced 
on the Borel subsets of [0, <] by the right continuous distribution function 
a’(-)/le + o(-)]. Let {Z,, t 2 O} be defined on (2, ®, P) by 


(—o(t)/e 
ZA(w) = \ 


orw = — 1 


tsupZ,2d=Pi0susr} =o(r)/fe +o (7)], 
te [0,7] 





““KOLMOGOROV'S INEQUALITY” 487 


and it remains only to verify that the process {Z,,¢ 2 0} satisfies the conditions 
of Theorem 3.1. We compute 


E(Z,) = [—o'(t)Plw > torw = —-1}/ + PIO Sw st = 90, 


and similarly obtain F(Z?) = o*(t), t = 0. Clearly E{Z,|Z, = ¢ = Z, where 
= —o'(s)/e; using the relation 


O S 8 < tare fixed. Let 6 = E\Z, | Z, 
0 = E(Z,) = E[E(Z,\ Z,)) = PZ, = § + OP{Z, = —o'(s)/d, 


we obtain @ = —o’(s)/e. Hence the process {Z,, ¢ 2 0} is a martingale satisfying 
the conditions of Theorem 3.1 and achieving equality in (3.1). 


REFERENCES 


{1} J. L. Doon, Stochastic Processes, John Wiley and Sons, New York, 1953. 

{2] J. HAsex anv A. Réwyi, “Generalization of an inequality of Kolmogorov,’ Acta Math. 
Acad. Sci. Hungar., Vol. 6 (1955), pp. 281-283. 

[3] Micuge. Lotve, Probability Theory, D. Van Nostrand, New York, 1955. 

[4) J. V. Usrensxy, Introduction to Mathematical Probability, McGraw Hill, New York, 
1937. 





A ONE-SIDED INEQUALITY OF THE CHEBYSHEV TYPE 
By Atpert W. MarsHatt AND INGRAM OLKIN' 
Stanford University; Michigan State University and Stanford University 
1. Summary and introduction. If x is a random variable with mean zero and 
variance o, then, according to Chebyshev’s inequality, P{|z| 2 1} S o’. The 
corresponding one-sided inequality P{z = 1} < o’/(o* + 1) is also known (see 
e.g. (2, p. 198]). Both inequalities are sharp. 


A generalization of Chebyshev’s inequality was obtained by Olkin and Pratt 
{1} for P{| z,| = lor --- or |a| = 1}, where Ez; = 0, Exit = o’, 


Exa; = 0 p(i ¥j), i,j =1,--+,k; 


we give here the corresponding generalization of the one-sided inequality, and 
we consider also the case where only means and variances are known. To obtain 
an upper bound for P{ze T} = Pix, 2 1 or --- or x 2 1}, we consider a non- 
negative function, f(z) = f(a, ---, t), such that f(z) 2 1 for re T. Then 
Ef(zx) 2 S jcer; f(x) dP = P{xeT}. Since the bound is to be a function of the 
covariance matrix, 2, f(z) must be of the form (x — a)A(zx — a)’, where 
a= (@,°°:,@),A = (aj): k X k. A “best” bound is one which minimizes 
Ef(x) = tr A(z + a’a), subject to f(x) = 0, f(z) 2 lonT 


2. Derivation of the bound. If D,.. = diag (1 — a,, --- , 1 — a), 
2= (x —a)Di., and A* = D,,AD,.,B = A*", 
then the bound can be written as 
(1) Ef(x) = tr A(Z + aa) = tr B'D7(E + a’a)DE.. 


Since f(a) = 0 and f(x) 2 1 for re T, agT and the conditions f(z) = 0, 
f(x) 2 1 on T become zA*z’ 2 0, zA*z’ 2 1 for ze T. By the results of [1], the 
bound is minimized by a positive definite matrix A for which the corresponding 
B has ones on the main diagonal. Thus the problem is to minimize the bound of 
(1) subject to a; < 1 (agT) and B = (b,;) positive definite with all b; = 1. 

Let D be the class of positive definite matrices, A = (6;;) with 6, = 1, 
5;; = 6(¢ # Jj). By writing A in the form A = (1 — 6)I + ée’e, where 
e = (1, ---+, 1), one can show that for any orthogonal matrix [ with first row 
e/Vk, 


(2) PAI’ = diag (1 + (k — 1)é, f —G& +++, 8 = §), 


Received July 27, 1959; revised November 2, 1959. 
1 This research was sponsored in part by the Office of Ordnance Research at Michigan 
State University and the Office of Naval Research at Stanford University. 


488 





CHEBYSHEV INEQUALITY 489 


so that A is positive definite if and only if (k — 1)~ < 8 < 1. If B/o’ e D, then 
we suspect because of symmetry that the minimizing Be D, and that a = ae. 
An example of sharpness would then justify this choice. 
Assuming that a = ae and that B = (1 — b)I + be’e e D, (2) can be used to 
write the bound (1) in the form 
s\-1 v 1 
H(a,b) =" (T Br’) (rer a'Te’eD’) 
(3) fe ka’ 4 e + b(o’t =" a’)} 
(1 — a)*(1 — b)[1 + (& — 1d)’ 
where t = (k — 1)(1 — p) — 1. The solution of 0H/da = 0 is 
a = —o'(1 + bt)/(1 — b). 
The equation dH (ao , b)/db = 0 can be written as 
(4) Wt(1 — ot) + 2b(1 — ot) — (0° +p) = 0, 
and has roots 


i 
(5) b = -} (1 + t) (+t)(k—1 —9) 


ti — ot) ‘t(k — 1) — o*))' * 
We assume that 1 — ot > 0 and use ah iacuias 


1+ t = (1 — p)[l + (k — 1)p) > 0, 
so that the roots are real. Because B must be positive definite, the lower sign is 


not an acceptable solution, and the upper sign is possible if and only if 
k = o’(k — 1)(1 + t). We assume this to be the case, and we denote by bp the 
root with the positive sign, and by By the corresponding matrix. Evaluation of 
Hao, b) using (4) yields 
(6) H(ao,b) = (ko°(1 + bt))/({L + (k — 1)b){1 + o* — b(1 — o’t)}) 
= (ko't)/({k — 2 + to” + o°(k — 1)] — 2b(k — 1)(1 — o’t)). 
Upon substitution for by, this becomes H(ao, bo) = ko't/(u — 2/v), where 
u = to’ + tk — 2 — (k — 1)o'] + 2(k — 1), and 
v= (1+t)(k -—1—t)(k — 1)(1 — o'). 
After rationalizing the denominator and substituting for ¢, we obtain the 
theorem. 
Tueorem: Let x be a random vector with Ex; = 0, Ex = 0°, Ex; = o'p(i ¥ j). 
If (i) 1 — ot > O, (ii) k = o' (k — 1)(1 + 8), then 
P= P{xm21or--- orz 2 1} S Hap, be) 


_ ke {V{l - + (k — 1)p)[1 + 0? — o°(k — 1)(1 — p)| + (k — DVI =)". 
{k + o%{1 + (k — 1)p}}? 


(7) 





otherwise P s 1. 





490 ALBERT W. MARSHALL AND INGRAM OLKIN 


For the special case p = 1, the inequality reduces to the univariate one-sided 
inequality, and, for p = —1/(k — 1), the bound is (k — 1)e’, which reduces to 
the univariate two-sided inequality for k = 2. It should be noted that the bound 
H(ao, bo) S 1 is equivalent to {o'(k — 1)((1 — p){l + (k - 1)p})* - 
kKi+e—-ec(k-1)01 — p))? => 0. 


3. Sharpness. We show sharpness of (7) by exhibiting an example which 
achieves equality whenever the conditions (i) and (ii) of the theorem are satis- 
fied. For cases that the theorem provides only the trivial bound unity, sharpness 
is shown by examples with k = 2. 

Let z be a random vector with the following distribution: Piz = b°°} = p/k, 
i= 1,---,k, Plz = 0} = 1 — p, where b"” is the ith row of B,. If 


xr = (1 — ao)z + ae 
satisfies the conditions of the theorem, then 
(8) E(z) = —aee/(1 — ao) = [1 + (k — 1)bolpe/k, 
(9) E(z’z) = (= + ave’e)/(1 — a) = pBi/k. 


Substituting for ap in (8) and solving for p, we obtain p = H(ao, bo), where 
H (ao , bo) is given by (6). Because of the special form of 2, the matrix equation 
(9) is equivalent to the two equations 


(10) = [(1 — bo)” + 2bo(1 — bo) + bok]p/k = (0° + ad)/(1 — an)’, 
(11) 2bo(1 — bo) + boklp/k = (0'p + as)/(1 — a)’. 


Substitution of p and a» in (11) and in (10) minus (11) yields (4) with b = bp 
in each case. Hence (8) and (9) are satisfied when p = H(ao, bo), that is, when 
p is given by the bound of (7). Since P{z; 2 1 for some 7} = p, and z,; 2 1 if 
and only if z; 2 1, it follows that z = (1 — ao)z + ave achieves equality in (7). 

Now suppose that k = 2, in which case conditions (i) and (ii) become 
1 + op = 0, and 2 = o'(1 — p), respectively. 

If 1 + op < 0, then a distribution having the prescribed moments and achiev- 
ing the bound of one is P{(1, —c)} = P{( —e, 1)} = p/2, Pi(e, —c)} = 
P{(—c, c)} = ps/2, P{(1,1)} = 1 — pi — pr, where p, = 20°(1 + p)/(c’ — 1), 
Po = (l+ep)/(1 - c’),c =ho'(1 + p) + (fo'(1 + p)? + 4o°})'}. The con- 
dition 1 + o’p < Oimplies that «* > 1 ande > 1. Hence0 < p,,~», pi + pr S 1. 

If 2 < o'(1 — p) and 1 + o’p > O, then a distribution with the moments 
prescribed in the theorem and achieving the bound of one is 


P{(1, —c)} = P{(—e,1)} = p/2, Pid, d)} = 1 — p, 


where 


5 2d _ 20°(1 — p) oa Lt (Ul = (1 + 09)’ 
PN2d+e-1 (ite? ’ p 





CHEBYSHEV INEQUALITY 491 


(ce = o'/2 if p = 0). The condition 2 < o°(1 — p) implies that ¢ > 1, which in 
turn implies that p < 1. It also implies 1 + ¢ < o’(1 — p), which is equivalent 
tod = p(c — 1)/2(11 — p) > 1. 

If 1 + op = 0, then the above distribution with d = 1,¢ = 0’, p = 2/(1 + o’) 
is the required example. 


4. An inequality involving variances only. If x, , --- , z, are random variables 
with Ex; = 0, Exi = of ,i = 1, -+-, k, then 
Pi|m| 2 lor---orjm| 21) Ss LiPiiz;| 2 s Lis). 
This inequality was proved to be sharp in [1], and the unique distribution attain- 
ing equality has zero covariances. 
The corresponding one-sided inequality is 
(12) Pl 21or---orm2 1) S$ Vi Piz;2 I Ss Vioj/(1 +43). 


li the bound is $1, the unique distribution attaining equality is 


P{(-—o}, 7 —oj-4, 1, —oF41, nae —oi)} = o3/(1 +3), j= be fe =i 


P{(—o1, —03,-*:, —oba, —on)} = 1 = 1 03/(1 + @3). 
Uniqueness follows by an argument similar to that used in [1]. We note that in 
this case the covariances Exjz; = —oioj are not zero. 

An alternative proof of (12) following the procedures of Section 1 is to choose 


B = I in (2), and to minimize tr Dy2,(= + a’a) with respect to a < 1. The 
minimizing a; = —o}. 


REFERENCES 
{1] Incram OLKIN AND Joun W. Pratt, “A multivariate Tchebycheff inequality,’’ Ann. 


Math. Stat., Vol. 29 (1958), pp. 226-234. 


{2} J. V. Uspensxy, Introduction to Mathematical Probability, McGraw-Hill, New York, 
1937. 





ON THE UNIQUENESS OF THE TRIANGULAR ASSOCIATION SCHEME 


By A. J. HorrmMan 
General Electric Company 


1. Summary. Connor [3] has shown that the relations among the parameters 
of the triangular association scheme themselves imply the scheme if n = 9. 
This result was shown by Shrikhande [6] to hold also if n Ss 6. (The problem 
hes no meaning for n < 4.) This paper shows that the result holds ifn = 7, but 
that it is false ifn = 8. 


2. Introduction. A partially balanced incomplete block design with two as- 
sociate classes [1] is said to be triangular [2], [3] if the number of treatments, », 
is n(n — 1)/2 for some integer n, and the association scheme is obtainable as 
follows: 

Let the v treatments be regarded as all possible arcs of the graph determined 
by n points; let the first associates of any are (= treatment) be all arcs each of 
which share exactly one end point with the given arc; let the second associates 
of any arc be all arcs each of which does not share an end point with the given 
arc and does not coincide with the given arc. 

Then the following relations hold: 

(2.1) The number of first associates for any treatment is 2(n — 2). 

(2.2) If 6, and 6 are two treatments which are first associates, the number 
of treatments which are first associates of both 6, and # isn — 2. 

(2.3) If 6, and 6 are second associates, the number of treatments which 
are first associates of both 6, and @, is 4. 

It is natural to inquire if conditions (2.1)-(2.3) imply that the » = n(n — 1)/2 
treatments can be represented as arcs on the graph determined by n points in 
the manner described above; i.e., if (2.1)-(2.3) imply the triangular association 
scheme. This is known ((3], [6]) to be so if n # 7, 8. 

We prove the result for 7. Actually we will prove the result for all n except 8. 
For n = 8, the theorem is false, as we shall demonstrate by exhibiting a counter- 
example. The derivation of this counter-example and a procedvre for finding all 
counter-examples are given in [4]. They are based on an elaboration of the de- 
vices used in Sections 3 and 4 of this paper. Other illustrations of the use of 
these devices are contained in [5]. 

Henceforth, we assume (2.1)—(2.3). 


3. The Association Matrix. Number the treatments from | to v in any order. 
Define the square matrix A of order v by 
(O ifi=j 
(3.1) A = (a;;) = 4 1 ifiandj are first associates 
0 if i and j are second associates 


Received August 31, 1959. 


492 





TRIANGULAR ASSOCIATION SCHEME 493 


Note that a,; = aj; . Next let B = AA” = A’, since A is symmetric. From (2.1), 
we have b;, = 2(n — 2). From (2.2), we have b,; = (n — 2) if i and j are first 
associates. From (2.3), we have b;; = 4 if i and j are second associates. If we 
let J be the square matrix of order v, with every entry unity, and J the identity 
matrix of order v, then the foregoing may be summarized by 


(3.2) A*? =2(n —2) 14+ (n—2)A4+4(J —I — A) 
= (2n — 8)1 + (n— 6)A + 4J. 


All the matrices appearing in (3.2) can be simultaneously diagonalized. Imagine 
(3.2) in diagonal form, and one sees that the diagonal entries relate the eigen- 
values of the matrices. 

Now J has the eigenvalue v, corresponding to the eigenvector (1, 1, --- , 1); 
all other eigenvalues of J are zero. The eigenvector (1, 1, --- , 1) clearly corre- 
sponds to the eigenvalue 2(n — 2) of A. Any other eigenvalue, a, of A corre- 
sponds to a zero eigenvalue of J; hence (3.2) implies that a satisfies the equa- 
tion a® = (2n — 8) + (n — 6)a, so that a = —2, ora = n — 4. 

The trace of A is zero, since a;; = 0 for all 7; hence the sum of the eigenvalues 
of A is 0. If k is the multiplicity of n — 4, it follows that 0 = 2n—-— 44 
k(n — 4) + (v — k — 1) (—2). So the eigenvalues of A are 


(a) 2n — 4 with multiplicity 1, eigenvector (1, 1, --- , 1) 
(3.3) (b) n — 4 with multiplicity n — 1 
(c) — 2 with multiplicity v — n. 


Note that v > n, so —2 is the least eigenvalue of A. 

This is the only use we shall make of (3.3) (c) in the present paper, although 
it plays a major role in the analysis of the exceptional cases for n = 8. We shall 
make no use of (3.3) (b). 

In what follows, we shall use two well-known properties of eigenvalues and 
eigenvectors of symmetric matrices, and for ease of reference, we now list 
them explicitly. 

Let M be a (real) symmetric matrix whose least eigenvalue is 8, and whose 
maximum eigenvalue is a > /}, with z an eigenvector corresponding to a. Let 
K be a principal submatrix of M, 6 the least eigenvalue of K and y an eigenvector 
of K corresponding to 6. Then 


(3.4) 62 86; 
and 


(3.5) if 5 = 8, then y is orthogonal to the projection of z on the subspace 
corresponding to K. 


From (3.4) and (3.3) (c) follow the fact that a principal submatrix of A 
cannot have an eigenvalue less than —2. From (3.5) and (3.3) (a) and (c), 
if —2 is an eigenvalue of a principal submatrix of A, then the corresponding 
eigenvector has zero as the sum of its co-ordinates. 





494 


4. The Case n = 8. 
Lemma 1. A does not contain 


(4.1) 


l 
1 
1 
0 


as a principal submatrix. 

This was proved by Connor [3] for n 2 9. We now prove it for all n + 8. We 
contend that A cannot contain any of the following three square matrices of 
order 5, each of which contains (4.1) as a principal submatrix: 


(4.2) (4.3) (4.4) 
0 0 0 0 0 
0 0 0 0 0 
0 0 0 0 0 
1 1 l 1 1 
ses l 1 0 1 0 


The impossibility of (4.2) and (4.4) follows from (3.4), since each has an eigen- 
value smaller than —2. Matrix (4.3) has —2 as an eigenvalue, with 
(1, 1, 1, —1, —1) as corresponding eigenvector, violating (3.5). 

Let us denote by 1, 2, 3, 4 respectively the rows and columns of A that pro- 
duced submatrix (4.1). Because (4.2) and (4.3) are impossible, it follows that 
4 is the only treatment that is a first associate of 1, 2, and 3. Hence, by (2.3), 
there are exactly nine additional treatments, each of which is a first associate 
of two of the set 1, 2, 3. Since (4.4) is impossible, it follows that each of the nine 
is a first associate of four. Together with 1, 2, 3, this yields twelve treatments, 
each of which is a first associate of 4. From (2.1), we must have 12 s 2n —4, 
which is impossible if n S 7. 

Now suppose n 2 9. Treatments | and 4 are first associates, and, by (2.2), 
there are n — 2 first associates of each. We have previously encountered 6, 
three of which are first associates also of 2, and three of which are also first 
associates of 3. Hence there are n — 8 additional ones. Similary, there are n — 8 
additional first associates of 2 and 4, and n — 8 additional first associates of 
3 and 4. Hence, from (2.1), 2(n — 2) = 12 + 3(n — 8), which is impossible 
forn = 9. 

Next, we prove 

Lemma 2. If 1 and 2 are second associates, 3, 4, 5, 6 first associates of both 1 and 
2, then (after renumbering, if necessary) the principal submatrix of A corresponding 
to rows and columns 1-6 is 





TRIANGULAR ASSOCIATION SCHEME 495 


Proor: Consider the 2(n — 2) treatments which are first associates of 3. 
None of them can be second associates of both 1 and 2, for this would violate 
Lemma |. Hence, if we let ¢ be the number of first associates of 3 which are first 
associates of 1 and 2, we have from (2.1) and (2.2), + (n — 2 — t) + 
(n — 2 —t) = 2(n — 2) — 2, or t = 2. These two must be some two of 
4, 5, 6, say 5 and 6. It follows that 3 and 4 are second associates, while 3 is a 
first associate of both 5 and 6. The inevitability of (4.5) is now clear. 

Lemma 3. Any matrix of form 


0 


corer oO 


is not a principal submatriz of A. 

Proor: If (4.6) were to exist, then z ~ 1. For 6 and 7 would be second associ- 
ates, and, if z = 1, then 1, 3, ana 8 would mutually be first associates, but this 
contradicts Lemma 2. So we must take z = 0. But then 2, 7, and 8 are pairwise 
second associates; 3 is a first associate of each of 2, 7, 8, and this violates Lemma 
Be 

Lemma 4. The matrix 


ocrr OOK eK 
coocororo}- 


is not a principal submatrix of A. 

Proor: All we want to show is that the other entries in (4.7) imply that 7 and 
8 are first associates, not second associates as (4.7) alleges. If 7 and 8 are second 
associates, then using the same reasoning as in the first part of Lemma 3, some 
two of 1, 3, 5 are by Lemma 2 second associates. But this is not so in (4.7). 

Lemma 5. The 2(n — 2) first associates of any treatment can be split into two 
classes so that the n — 2 treatments of one class are mutually first associates of each 
other; the n — 2 treatments of the other class are mutually first associates. 

Proor: Let 1 be the treatment. Let 3 be a first associate of 1, 2 a second 
associate of 1 and a first associate of 3, and 4, 5, 6 chosen so that we have the 
submatrix of Lemma 2. In addition to 5 and 6, there are n — 4 other first associ- 
ates of both 1 and 3. Each of these must be a first associate of at least one of 





496 A. J. HOFFMAN 


5 and 6. Otherwise it, 5 and 6, would be mutually second associates, and 1 
would be a first associate of each of the three, violating Lemma 1. Further, by 
Lemma 3, each of these n — 4 treatments is a first associate of 5 or each is a 
first associate of 6. Without loss of generality, say it is 5. By Lemma 4, these 
n — 4 treatments are mutually first associates. Further, each is a first associate 
of 3 and 5, which are themselves first associates, and thus 3, 5, and these n — 4 
treatments are altogether n — 2 first associates of 1, which are mutually first 
associates. 

Of the n — 2 first associates of 1 and 4, 5 is in the class already described, 6 
is not, and there are n — 4 others. These n — 4 are mutually first associates by 
the same reasoning as above; they are entirely different from the previous 
n — 4 of the first class, since each of those was a second associate of 4; each is 
obviously a first associate of 6 as well as 4; so 4, 6, and these n — 4 treatments 
constitute our second class. 

TuHeoremM 1. Jf n # 8, then condition (2.1)—(2.3) characterize the triangular 
association scheme. 

Proor: It has been shown by Shrikhande [6] that Lemma 5 implies Theorem 
;. 


5. The Case n = 8. 

TueoreM 2. If n = 8, then conditions (2.1)—(2.3) do not necessarily imply 
the triangular association scheme. 

Proor: Here is a counter-example. Notice that the first principal submatrix 
of order 5 violates the triangular association scheme. 


01 
01 
1 
1 
1 
1 
0 
0 
0 
0 
0 
0 
0 
0 
0 
1 


00 1 
00 0 
000 
0 0 0 
0 0 
0 0 
ee 
7 

1 


—_ 


Coe. s. 2. 2. 2 
re tet s 


_— 
= 
> 
~ 
~ 

— 


0 
1 
1 
0 


0 
0 
1 
0 
0 
0 
0 
1 
1 
0 
1 
0 
0 
1 
0 


o 
= 


a 
> 

~ 

— 


- Of RK eS 
= 
=> 


-o So 
— i 


_ 
wre Ooo rf 


coor r 2 


— i 
ore 
o- 
ed 
SoOorr OOO} = 


i 


— 
=> 


oS 


oo 
or ocrr Of 


—- COr ocr or core 
So 


a) 
o 
So 
cooeco Ow wr eke OCC CorFr SK eS 


mOemewcoomwroooororee 
- 
° 
wn 


Keocoocor or ower OOroror 


- 
S 


— nn 


CcOowr ow cocooeoooorrrwr Oo ooceo 

me oocrooocorrw oor oooceo 

Ow orwrwv Oocoorrw ooooroce 

ow or Orr coocooorococo 

——asewe OCCOFM Or Cer Sor OOS SoK 
cocoowrw rw Or Or Or Sor rR OSCrK 
SOOO Mm mM ewe Ree eee eK Ke OOF ooo CoO SO 
cooewr Or Or Or Orr oor or se 


—~——a ee COCK Cer CK SO 
—- Or or Orr SO 
coor r Or KF Of 


oS 
Ow owrw COMmeKr SCOrKP Or Or OK OK OOO rF SS 


- 
—memoocor rw Or OOorwr oor eK 
Or oer Oorwr Or or oorr or ooores 


= 
oS 
_ 
on) 





TRIANGULAR ASSOCIATION SCHEME 


REFERENCES 

{1} R. C. Bose anp K. R. Narr, “Partially balanced incomplete block designs,’’ Sankya, 
Vol. 4 (1939), pp. 337-372. 

{2} R. C. Bose anp T. Satmamoro, “Classification and analysis of partially balanced de- 
signs with two associate classes,’’ J. Amer. Stat. Assn., Vol. 47 (1952), pp. 151- 
190. 

[3] W. 8. Connor, “The uniqueness of the triangular association scheme,’’ Ann. Math. 
Stat., Vol. 29 (1958), pp. 262-266. 

[4] A. J. Horrman, “On the exceptional case of a characterization of the arcs of a com- 
plete graph,”’ to appear in JBM Journal of Research. 

(5) A. J. Horrman anp R. R. Sineieton, “On a graph-theoretic problem of E. F Moore,” 
to appear in JBM Journal of Research. 

(6) 8S. S. Sarix#anpe, “On a characterization of the triangular association scheme,’’ Ann. 
Math. Stat., Vol. 30 (1959), pp. 39-47. 


a eR 


NOTE 


The results of this paper have also been obtained, using different methods, by Chang: 
L. C.,° The Uniqueness and Nonuniqueness of the Triangular Association Schemes,” Science 
Record, Vol. 111, New Series, 1959, pp. 604-613. Chang has also shown that there are exactly 
three counterexamples when n = 8 (“Association Schemes of Partially Balanced Designs 
with Parameters vp = 28, n; = 12, nme = 15 and pit = 4.” Science Record, Vol. IV, New 
Series, 1960, pp. 12-18). 





PROPER SPACES RELATED TO TRIANGULAR PARTIALLY 
BALANCED INCOMPLETE BLOCK DESIGNS’ 


By L. C. A. Corsten 


Institute for Research on Varieties of Field Crops, Wageningen, The Netheriands 


1. Summary. The proper spaces of the matrix NN’, where N is the incidence 
matrix of a triangular partially balanced incomplete block design, are exhibited 
explicitly ; this provides a convenient form for the Gramian of a basis of the join 
of two of these spaces. 


2. Introduction. The matrix N, the incidence matrix for an incomplete block 
design, is the matrix with v rows (v is the number of treatments) and b columns 
(b is the number of blocks) where the typical element, nj; , is unity if the ith 
treatment occurs in the jth block, and is zero otherwise. The non-negative 
symmetric matrix Q = NN’ of order v has elements q;; , where q;; is equal to 
the number of replicates of the ith treatment, and q,; (1 # 3) is equal to the 
number of blocks in which the ith and the jth treatment occur together. 

In the case of partially balanced incomplete block designs with two associate 
classes, of which the class of triangular designs as developed by R. C. Bose and 
T. Shimamoto [1], is a subclass, the numbers q;, are all equal (to r, the common 
number of replications), while the qi; (¢ # j) are either \; or \, , depending on 
whether the pair of treatments i and j are first or second associates respec- 
tively. 

The knowledge of proper values and spaces of Q is of interest in finding condi- 
tions of existence of designs with given sets of parameters; in addition it can con- 
tribute to a better understanding of the analysis of actually constructed designs 
and lead to the attachment of a physical meaning to their association schemes. 
We will pay attention to the last-mentioned points in a later paper. 

Knowledge of the proper values of Q for several cases, including the triangular 
designs, as given by Connor and Clatworthy [2] has already been utilized for the 
derivation of necessary conditions for the parameters of such designs. Knowledge 
of the proper spaces of Q for triangular designs, as will be shown in this note, 
provided Ogawa [3] other conditions for the existence of such designs. 


3. The proper spaces of Q. In order to consider the proper values and spaces 
of Q = NN’ we conceive Q as the matrix of the linear transformation Q of a vector 
space A consisting of vectors r = (2, 22, °-* , 2») into itself, where the coor- 
dinate x; corresponds to the ith treatment. 


Received August 27, 1959; revised November 8, 1959. 

! This research was performed while the author was at the Department of Statistics of the 
University of North Carolina and was sponsored jointly by the United States Air Force 
through the Air Force Office of Scientific Research of the Air Research and Development 
Command under Contract No. AF 49 (638)-213 and by a grant of the Netherlands Organiza- 
tion for Pure Research (Z.W.O.). Reproduction in whole or in part for any purpose of the 
United States Government is permitted. 


498 





TRIANGULAR PBIB DESIGNS 499 


Since triangular designs are special cases of partially balanced incomplete 
block designs with two associate classes, the number of first and second associ- 
ates of a fixed treatment is independent of the chosen fixed treatment. Hence 
the ith cordinate y; in y = Qz is equal to rz; + \,S; + AS. , where S; (j = 1, 2) 
represents the sum of the coordinates in x corresponding to the jth associates of 
treatment 7. 

We see immediately that if x = (1, 1, ---, 1) then y, is equal tor + Ay, + 
dene Where n; ‘j = 1, 2) is the number of jth associates of treatment 7. As the n; 
are independent of 7 in all partially balanced incomplete block designs it follows 
that s = (1, 1, ---, 1) is a proper vector of Q with proper valuer + Ayu + 
Aen. = rk where k is the block size. 

For further investigation of the proper spaces of Q we shall consider the 
(v — 1)-dimensional subspace A* of A orthogonal to s. For every vector z in 
A*, 2; + S,; + S: = 0. Hence the coordinate y, in Qz, if z is restricted to A*, is 


(1) (r — Ag)ae + (Ar — Ad) Sy 


For triangular designs with parameter n (n is an integer greater than 3), the 
first and second associates of each treatment can be read from an association 
scheme which is constructed as follows. Consider an n by n square array in which 
the diagonal positions are denoted by * and where the remaining n(n — 1) 
positions each contain one of the 4n(n — 1) treatment indices such that each 
index occurs twice and symmetrically with respect to the diagonal. For n = 5 
e.g. this might be 


og 
ie 
: 8 9 
6 «* 10 
7 9 10 * 


The first associates of any treatment are all those treatments which occur in the 
same row or the same column as this treatment, while the second associates are 
those which do not occur in the same row or the same column as this treatment. 
We note that n; = 2(n — 2) in this case. 

For convenience, we write the coordinates of any vector z in A in the same 
arrangement as the corresponding indices in the upper diagonal part of the asso- 
ciation scheme. We now construct a set of n vectors, ¢; , (2, «-* ,¢,, in A in the 
following way. Write unity in all the positions of which the corresponding indices 
occur in the pth row of the association scheme; write zero everywhere else. The 
resulting vector is called c, . 

In our example with n = 5 we obtain 

a. 2 1 100 0\ /O 1 
0 0 ie 2 ] 

oO} 0 O}’ 

0, 0/ 


for ¢;, -*- , ¢s respectively. 





500 L. C. A. CORSTEN 


Let the n-dimensional subspace of A spanned by these n linearly independent 
vectors be called A, . We note that A, contains the one-dimensional space spanned 
by s. Now consider the inner product of any vector in A; , ye) + «++ + Yala, 
and s; this is equal to y:(c; , 8) + °°: + n(n, 8) = (mn — 1L)(yi + ++ + Yn)- 
Hence, for any vector in A; , the (n — 1)-dimensional subspace of A, orthogonal 
to s, we have y; + --* +7, = 

We further note that, if treatment i occurs in the pth row and the qth column 
of the association scheme, the coordinate x, of any vector y:¢; + --- + Yatn in A; 
corresponding to the pth row and the gth column of the association scheme is 
equal to yp + ¥¢- 

Let x be any vector in A} . Then the sum of the coordinates of x corresponding 
to all the treatments in the row and the column of the association scheme in 
which treatment 7 occurs (treatment 7 is counted twice in this sum) is equal to 
2x; + S, on the one hand; on the other hand, according to the last two para- 
graphs, it is equal to 


(n — L)yp + (1 + +++ +n) — Hot + 1(m — Vove + (mm + °* +0) — Hd 
= (n — 2)(yp + ¥¢) = (mn — 2)z;. 


Hence S, = (n — 4)z; for all vectors in A; . 

Now it follows from (1) that the coordinate y; of Qx, where z is restricted to 
At , is ir + (n — 4)\; — (nm — 3)As}z;. Therefore At isa proper space of Q 
with proper value r + (n — 4); — (nm — 3)d2. 

Finally we consider the (orthogonal) complement of A; with respect to A 
(which of course is the same as the complement of A] with respect to A*) and 
call this A, . The dimension of A: is 4n(n — 1) — n = 4n(n — 3). 

Since every vector in A, is orthogonal to the given n basis vectors of A; , the 
sum of its coordinates corresponding to a row (or a column) of the association 
scheme must be zero. Taking the sum of its coordinates corresponding to the 
row and the column of the association scheme in which treatment 7 occurs 
(treatment 7 is counted twice again), we find that 2x; + S,; = 0. 

Again, from (1) it follows that the coordinate y; of Qz, where z is now restricted 
to Az, is (r — 24; + Az)a;. Therefore A, is also a proper space of Q and the 
corresponding proper value is r — 2A; + de. 


4. The Gramian of A, . In connection with conditions for constructibility one 
is interested (see Ogawa [3]) in the Gramian, the symmetric matrix of inner 
products of a set of basis vectors of proper spaces of Q or of joins of such spaces. 
In the present case it is quite easy to find the Gramian of the given basis of Aj, 
the join of the proper space Ay and the proper space spanned by s. We simply 
need the inner products of the vectors ¢; , «++ , Cn. 

The diagonal elements of this Gramian are all equal to the number of unities 
in these basis vectors, i.e. n — 1, while the off-diagonal elements are equal to the 
number of treatments which two different rows of the association scheme have 
in common, namely 1. 





TRIANGULAR PBIB DESIGNS 


REFERENCES 

[1] R. C. Bose ann T. Sarmamoro, “Classification and analysis of partially balanced incom 
plete block designs with two associate classes,’’ J. Amer. Stat. Assn., Vol. 47 
(1952), pp. 151-154. 

[2} W. R. Connor anv W. H. Ciatworrny, “Some theorems for partially balanced de- 
signs,’’ Ann. Math. Stat., Vol. 25 (1954), pp. 100-112. 

[3| J. Oaawa, ‘‘A necessary condition for existence of regular and symmetrical experimental 
designs of triangular type, with partially balanced incomplete blocks,’’ Ann. 
Math. Stat., Vol. 30 (1959), pp. 1063-1071. 





BALANCED FACTORIAL EXPERIMENTS’ 
By B. V. SHan® 
University of Bombay 


1. Introduction and summary. Usually, in a factorial experiment, the block 
size of the experiment is not large enough to permit all possible treatment com- 
binations to be included in a block. Hence we resort to the theory of confounding. 
With respect to symmetric factorial designs, the theory of confounding has been 
highly developed by Bose [1], Bose and Kishen [2] and Fisher [4], [5]. An excel- 
lent summary of the results of this research appears in Kempthorne [6]. Some 
examples of asymmetric factorial designs can be found in Yates [14], Cochran and 
Cox [3], Li [8], Kempthorne [6] and Nair and Rao [9], [10]. Nair and Rao [11] 
have given the statistical analysis of a class of asymmetrical two-factor designs 
in considerable detail. The author [13] has considered the problem of achieving 
“eomplete balance’ over various interactions in factorial experiments. In the 
present paper a class of factorial experiments, balanced factorial experiments 
(BFE) (Definition 4.2) is considered. The theorems proved in Section 5 outline 
a detailed analysis of BFE’s, including estimates of various interactions at differ- 
ent levels. Finally, a method of constructing BFE’s is given in Section 6. 

It should be noted that Theorems 5.2 to 5.5 are generalisations of the cor- 
responding theorems by Zelen [15], and the method of construction in Section 6 


is a general form of the one indicated by Yates [14], Nair and Rao [9], [10] and 
Kempthorne [6] (Section 18.7). 


2. Notation. Let there be v treatments, each replicated r times in 6 blocks of 
k plots each. Let N = [n,,|(¢ = 1, 2, --- , 0; 7 = 1, 2, --- , b) be the incidence 
matrix of the design, whern n;; is equal to the number of times the ith treatment 
occurs in the jth block. The set up assumed is 


(2.1) yj = ott +b; + ej, 


where y;; is the yield of the plot in the jth block to which the ith treatment is 
applied, yu is the over all effect, ¢; is the effect of the ith treatment, 5; is the effect 
of the jth block, and «,; is the experimental error. The effects yu, t; , b; are assumed 
to be fixed constants, while the errors ¢,;’s are assumed to be independent normal 
variates with mean zero and variance o. Let T; be the total yield of all the plots: 
having the ith treatment, B; be the total yield of all the plots of the jth block 
and é; be a solution for ¢; in the normal equations. Further denote the column 
vectors with elements {7 , T:,--- , Ty}, {Bi, Bs, --- , Bol, ft, te, +++ , te} and 

Received September 2, 1958; revised April 2, 1959. 

1 This work was supported by a Research Training Scholarship of the Government of 
India. 

? Present address, lowa State University. 


502 





BALANCED FACTORIAL EXPERIMENTS 503 
{é,, &,--- , £} by T, B, t and ¢ respectively. It is well known that the reduced 
nermal equations for the intra-block estimates of treatment contrasts are 
(2.2) Q = Ct, 


where 


(2.3) Q=T- NB 


and 
(24) C = r(v) — | NN’, 
where I(v) is the v x v Identity matrix. The matrix C defined in (2.4) will be 


called the C-matrix of the design. 


3. Some useful results. 


DeriniTion 3.1. If ll = 1(1is av x 1 matrix), the contrast I’t will be called 
a normalised contrast. 


DEFINITION 3.2. A normalised contrast I’t will be called a canonical contrast 
of the design, if 1 is a canonical vector of the C-matrix of the design. 


LemMaA 3.1. A necessary and sufficient condition for a normalised contrast I't to 
be a canonical contrast is 


(3.1) ’Q = mH, 
where 


(3.2) @=r— 1 (INN'). 


Lemma 3.2. A canonical contrast \'t is estimable, if the @ given by (3.2) is not 
equal to zero and then 
(3.3) lt = 19/6 
and 
(3.4) Vit) = o°/@. 


Lemma 3.3. Let | be a normalised contrast and @ be given by (3.2). Then each of 
the following three conditions implies the other two: 


(i) O=r. 

(ii) Nl = 0. 

(iii) I’t is estimable with the minimum variance o’/r. 
Then 
(3.5) 'Q = IT. 





504 B. V. SHAH 


Hence Yt, its variance and the sum of squares due to Yt are the same as those in 
randomised block design. 

Lemma 3.4. If lit, Lit, trey lt are n linearly independent contrasts, such that 
“ti = 1,2,---,m) is uncorrelated with the estimate of any contrast orthogonal to 
all of \;t, and i evefry normalised contrast of the form Yt = > aj,t has the same 
variance, then any normalised contrast of the form \'t is a canonical contrast. Further 
any two contrasts >_ alt and ee bd.t are uncorrelated, uf they are orthogonal. 


4. Factorial experiments. Let F; , fF: , --- , F, be m factors at 8; , 8%, -** , 8m 
levels respectively. Let v = 88) --- 8, treatments be denoted by the levels of 
the factors as (2%. --- 2m), where z; is the level of the ith factor and takes 
values 0, 1, --- , 8; — 1. Let t(ayx2--+ 2m) be the effect of the treatment com- 
bination (2:22 --- tm). The contrast >> Cy,25---emt(2it2 -*- 2m) [Where summa- 
tion is over all the values of (2,72 --- 2»)} will be called a contrast belonging 
to the interaction F,,F;, --- F;,, if and only if ¢,,2,...2,, i8 a function of the q 
levels xi, , ig, +++ , Zi, only and 


2 toy 30 tt j@b,b, ---.& 
a= 

DeFINITION 4.1. “Complete balance” is achieved over an interaction, if 
and only if all the normalised contrasts belonging to the same interaction are 
estimated with the same variance. 

Derinition 4.2. An experiment will be called a balanced factorial experiment 
(BFE), if the following conditions are satisfied: 

(a) Each of the treatments is replicated the same number of times. 

(b) Each of the blocks has the same number of plots. 

(c) Estimates of contrasts belonging to different interactions are uncorrelated 
with each other. 

(d) “Complete balance” is achieved over each of the interactions. 

THEOREM 4.1. A normalised contrast belonging to an interaction is a canonical 
contrast of a BFE. 

The proof of Theorem 4.1 follows from Definition 4.2 and Lemma 3.4. 

In a factorial experiment, it is well known that each treatment effect can be 
expressed in terms of main effects and interactions as given by 


m m j-l 
(ate +++ tm) = DUP), + DD UP aye; + °°: 
(4.1) ist j=? i= 


+ UF\F: --- Fy)xit2 --- Im- 


The ¢(F;),, is a constant associated with the main effect of the factor F; at the 
level x;, the t(FF;).,2; is a constant associated with the interaction between 
the factors F; and F; at the levels z; and x; respectively, etc. 

In further discussion that follows in Sections 4 and 5, we shall state and prove 
results for the first q factors F; , F; , --- , F, only, for the convenience of notation. 


However, the results are true for any q factors F;, , Fi, ,--- , Fi, . 





BALANCED FACTORIAL EXPERIMENTS 505 


The parameters defined in (4.1) are not all linearly independent and satisfy 
the following relations: 


s;-1 
(4.2) So FPs s+ Fydeeenng =O forj = 1,2,-++4@ 
The estimate of t(F iF: --- F)2,23---2, Will be denoted by ((F iF: --+ Fg) sy25---2, - 
Following the notation used by Zelen [15] let us define S-functions as follows: 


:; mine: satel ie lyf 
(4.3) S(t; FiFs +++ Fy| tite +++ tq) = = I] 8; >.’ tyrys «> Ym), 
I= 


where >>’ refers to the sum over all treatments which have the same levels 
%1, %2,°**, 2 for the q factors F,, F:,--- , F, respectively. Then, it can be 
shown that 


q j=l 


q 
S(t; FiF: «++ Pq| tte +++ %) = 2 UF ;)s, + Dy 2 UCP Fy) apes 
j= 


j=? 


(4.4) 
a eos + UF iF: oe Fe) o:0s +*B,’ 


q 
(4.5) t(F iF: «++ Fq)a2--29 = (—1)"2 (-1)"tw()}, 


where {w(t)} denotes the sum of the functions S(t; ---) involving exactly w 
factors out of the g factors F, , F:, --- , F, only. 

The equations (4.3) and (4.5) give S-functions and factor-interactions in 
terms of treatment effects. We define similar functions in terms of the adjusted 
treatment totals Q(wye---ym) as follows: 


/ 2 , . ly ’ 
(4.6) S(Q; Fif: --- Fq| tite --+ tq) = > I] 8; Do’ Q(yye «++ Ym) 
I= 


and 
q 
(4.7) QPF 2: + Fe) ayes ‘te = (—1)" (—1)"{w(Q)}, 


where yy is as defined in (4.3) and {w(Q)} denotes the sum of S-functions 
for Q involving exactly w factors from F,, F:, --- , F, only. It can be shown that 
the functions defined in (4.6) and (4.7) satisfy the relations exactly similar to 
(4.1), (4.2) and (4.4). 

Lemma 4.1. If > Ce,29---tat Zit" -*Im) 18 any contrast belonging to the q-factor 
interaction F\F,---F, , then it can be expressed as a contrast in terms of the factor 
interactions t( FFs: --F ¢)2,24---2, at different levels but belonging to the same q-factor 
interaction. A similar result holds also for the corresponding Q-functions. 

The proof of Lemma 4.1 follows from (4.1) and (4.2) with a little algebra. 


5. Analysis of BFE. The following vectors, matrices, and matrix operators will 
be useful in later results. 





506 


(i) t(s,) 

(ii) Q(s;) 
(iii) t( Fy) 
(iv) Q(F;) 
(v) acl) 

(vi) 6(1) 

(vii) I(m) 
(viii) I*(m) 


(ix) Emn 

(x) E(s) 
(xi) L(s) 
(xii) M(s) 
(xiii) N(s) 


(xiv) G(s) 


= the column vector {Q(0), Q(1), --- 


B. V. SHAH 


the column vector {1(0), t(1), --- , t(s; — 1)}. 
’ Q(s8; Ty 1)}. 
t(P; )a “Sh ae t(F;),,-a}. 


the column vector {Q(F;)o, Q(F:):, » Q(F;).,-a}. 


the column vector {t(F;)o, 


the column vector {Ao , A}. 
the column vector {6 , 6}. 
the m x m Identity matrix. 


the m x m matrix obtained by replacing 0 for 1 in the last row 
and last column of I(m). 


= the m x n matrix with all the elements equal to unity. 


8. B., ° 


an s x 8 — 1 matrix whose columns are mutually orthogonal 
normalised vectors also orthogonal to E(s). 


[L(s)|\E(s)}. 
I(s) — E(s)E’(s) 


eo 


= L(s)L’(s). 


The operator ‘‘X” denotes the Kronecker product of matrices defined by 


(5.1) 


The operator 


AXB-=a;X B= 


ay,B 2B 
anB QB 


4,B 
Q2,B 


nl 


““®” denotes the symbolic Kronecker product of subseripts and 


Om2B 


Omi B 


suffixes defined by the following illustrations: 


Q(00) | 
Q(01) | 
| Q(10) | 
Q(il) 
Q(20) 

_Q(21). 


QUP iF j) 0 
QPF ia 
QUP:F;)10 |” 
QUPF;)u- 


Q(3) ® Q(2) = 


jQ¢F Ty 


QF 3)0) in 
\Q(F)), 


| ® | ocF,),! 





BALANCED FACTORIAL EXPERIMENTS 507 


Tueorem 5.1. A BFE in m jactors F,, F., --+, Fm at 8, 82, ***, 8m levels 
respectively is a PBIB with relevant parameters and conversely. The two treatments 
are the pipe: * - Pmth associates, where p; = 1, tf the ith factor occurs at the same level 
in both the treatments and p; = 0 otherwise; X»,p,---p,, Will denote the number of times 
these treatments occur together in a block. Now, if any contrast belonging to the inter- 
action F ,,F ;,---F;, is estimated with the variance 
(5.4) Pints i 
where 


a if j = i, t2,°*+ tg; 


(5.5) an! 


0, otherwise, 
then the relation between 0’s and i’ is 


0(1) @ (1) @ --- @@(1) = — j(G(m) x Gm) x - X O(a,)] 


(5.6) 


- [A(1) @2(1) @ --- @ ac), 
where 


(5.7) Ou...0 = 0 and Aue = = r(k — 1). 


The proof of Theorem 5.1 follows from Theorem 6.1 of [12], on substituting 
m = mM, = --> = m = Landh = m. 

Tueorem 5.2. In a BFE, if a normalised contrast belonging to the interaction 
F,F.---F, is estimated with the variance o°/6, then the estimates of the same inter- 
action at different levels are given by 


(5.8) UCP Pa +++ Pdeserety = g QPP +++ Padeventy 


Proor. Using Definition 4.2, Theorem 4.1 and Lemma 3.4, it can be shown 
that 


lat, x He X -- + X Hn Q( 81) @ Qs) @ --- @Q(4n) 
(5.9) 8 
= H, X H: X --: XH, - t(%) @t(s) @ --- @t(s,), 
where 
(L’(s,;), ifj = 1,2, ---,q; 
(5.10) ,=' 
\E’(s;), otherwise. 


By the substitution of treatment effects in terms of main effects and interactions, 
the right hand side of the equation (5.9) can be simplified to 

@ 
: I] 87? L’(s:) X L’(s2) x --- K L'(s,). 


aml 


(5.11) 
t(F:) @t(F:) @ --- @t(F,). 





508 B. V. SHAH 


The left hand side of the equation (5.9) can also be simplified in the same way, 
and we obtain 


; L’(s,) X L’(s) X --- X L'(8,)Q(F:) ® Q(F2) @ --- @ Q(F,) 


= L'(s,) X L'(s) X --- K L’'(s,)t(F:) @ t(F2) ® --- @t(F,). 


Then, on introducing the marginal relations (4.2), 


(5.12) ; M’(s;) X M’(8,) X --- X M’(s,)Q(F:) ® Q(F2) @ --- @Q(F,) 
0. 


= M’(s,:) X M’(s:) X --- KX M’(s,)t(F:) @t(F2) ® --- @ t(F,). 


Hence, on multiplying both sides by the Kronecker product of the correspon- 
ding M matrices, (5.12) simplifies to 


(5.13) ; Q(F:) @ Q(F:) @ --- @Q(F,) = (Fi) @ (Fr) @ --- @UF,). 


This proves Theorem 5.2. 

TuEoreM 5.3. If, in a BFE, two factor interactions t(F\F2---Fy)2,2,--.2, and 
t(F Fig: ++ Fi, )uive---v, do not have all the factors identical, then their estimates are 
uncorrelated. 

Proor. It can be seen from (5.9) and (5.13) that the estimates of the factor 
interactions are obtained from the contrasts belonging to the corresponding inter- 
actions. In a BFE contrasts belonging to different interactions are uncorrelated 
and hence the estimates of the factor interactions belonging to different inter- 
actions are uncorrelated. 

Tueorem 5.4. If, in a BFE, the variance of any normalised contrast belonging 
to the q-factor interaction F\F,---F, is a’ /0, then the variance of U(F\F2---F, Jerae---z¢ 
18 [L2i(s; — 1)e’/v@ and the covariance between i(F,F::- °F e)a:0y:--24 Gnd 
i( FF: +P )yive---v_ » provided exactly h of the x; are equal to the corresponding 
y;, ts (—1)**]]'(s; —1)o’/v0, where []' represents the product for those factors 
for which x; = y;. 

Proor. The right hand side of equation (5.9) represents a set of normalised 
orthogonal contrasts belonging to the interaction F,F,---F, . Hence by Lemma 
3.4, its variance-covariance matrix is (o"/@)I. Consequently, the variance-covari- 
ance matrix of (5.11) can be written as 


(5.14) © 1s —1) X Ms —1) X ++» X14 — 1). 


Hence it can be deduced that the variance-covariance matrix of the right hand 
side of (5.12) is 


(5.15) 7 I*(a) X I*( 8) K «+» KI*(s). 





BALANCED FACTORIAL EXPERIMENTS 


Now, on applying 
(5.16) M(s)I*(s)M’(s) = L(s)L’(s) = N(s), 


it follows that the variance-covariance matrix of the right hand side of the 
equation (5.13) is 


(5.17) S I] 8y-N(s,) X N(a) X --- XN(4). 


The required expressions for variances and covariances can be obtained from 
(5.17). 

Tueorem 5.5. If, in a BFE, the variance of any normalised contrast belonging to 
the interaction F\F,---F, is 0/0, then the sum of squares due to the same interaction 
is given by 


(I «) vO EF; Fa: + + Fe) yey---2, 
mann ti Ks 
= (I «) . Q(Fi Fs: + -Fe)iyey---2¢s 
=i 6 


where the summation is over all possible values of (2,22: -+2,). [ts expected value is 
@ q 

(5.19) (I +) "00 3 (FFs «++ Fe)2yep---2, + LI (a; — 10’, 
j=l j=l 


and it is distributed as o’y° variate with | [3.:(8; — 1) degrees of freedom under the 
null hypothesis that the interaction FF ,---F, is zero at all the levels. 

Theorems 5.1 to 5.5 indicate a method of analysis for BFE. This method is 
useful only when estimates of interactions at different levels are required. For 
obtaining the analysis of variance table, a simple course would be to employ the 
method outlined in [13]. 


6. A method of construction. In this section we shall derive a method of con- 
structing a BFE in (m + n) factors from two known BFE’s in (n + 1) factors 
and m factors respectively. 

The method employs replacement of different levels of a factor in one design, 
by the distinct sets of treatment combinations forming the blocks of another 
design. By the statement, that the level zo of the first factor in the treatment 
(xer,---z,) is replaced by the block (of another design) containing treatments 
(Yaryi2"** Yim), (Yorer-* Yom), °° *  (YerYae’ + *Yem); We shall mean that the treat- 
ment (2et---z,) is replaced by a set of k treatments (yayia---Yim%it2-*-Ta)}, 
i = 1,2, --- , k respectively; these treatments belong to a new factorial design 
in (m + n) factors. As an illustration, if the block A contains treatments (120), 
(203), (111) and (112), then the statement that the level 0 of the first factor 
(0120) is replaced by the block A will mean that the treatment (0120) is re- 
placed by a set of 4 treatments (120210), (203210), (111210) and (112210). 

Further, for employing this method, we need a known BFE with some specific 





510 B. V. SHAH 


properties. We shall assume that there exists a BFE in m factors F;, F2, --- , 
Fy at 8, 8, °-* , &m levels respectively with s,s.---s, = v* treatments, each 
replicated r* times in b* blocks of k* plots each, with the incidence matrix 


(6.1) N* = [ni] = [A;| A}! 


Further assume that b* = pq, and it is possible to put pq blocks in p groups, 
each containing qg blocks, in such a way that the design consisting of p blocks 
formed by adding together all the blocks of a group is a BFE. Without loss of 
generality it can be assumed that the incidence matrix of this BFE is 


a q q 
(6.2) N;, = [> Ay\ >> Aes} | DS Ave-evs 
j=l j=l j=l 

It can be seen that for a resolvable design N*, the corresponding design N>, 
exists with p = r*. Another simple example is that the design N* is a 2° factorial 
design in 3 factors A, B and C, in 4 blocks of two plots each, obtained by con- 
founding the interactions AB, BC and AC; and the design Ni is the design in 2 
blocks of 4 plots each, formed by confounding the interaction AB only. 

Tueorem 6.1. Let there be a BFE N in (n + 1) factors Fo, Fai, Fmae, ++, 
F min Gt 8 , 8m41, 8m425 °°", Sman levels respectively (8 = q), in b blocks of k plots 
each (with r replications). Also let there be two BFE’s N* and N%, as given by 
(6.1) and (6.2). Now, if the level j — 1 of the factor Fy is replaced by the block 
Aigss (9 = 1, 2, «++ , g) tm each of the treatments of N, then the design obtained 
by adjoining the p designs so formed (fori = 0,1, ---,p — 1) isa BFEinm +n 
factors with rr* replications in bp blocks of kk* plots each. 

Proor: Let the incidence matrix of the BFE in n + 1 factors Fo, Fama, 


Fuse, atte Patn be 
I 
N, 


N= | 


Ln, 
where N; is a matrix Of 8mi8ms2°*‘8min = vV rows corresponding to v treatments in 
each of which the factor Fy occurs at the same level j — 1, and further that the order 


of these v combinations in each of the sub-matrices is the same. Then the incidence 
matrix of the constructed design is 


H = [> A; X N;| >> Ags X Ny! --: | Do Ape-ets X n,|. 
I= j=l i=} 


Now from Theorem 4.1, it can be shown that 


an NWN; = NWN, =U, say; 


NN; = NN: = W, say, if i ~ jand1 = k. 





BALANCED FACTORIAL EXPERIMENTS 511 


This equivalent to the fact that C-matrix of a BFE is invariant under renaming 
of the levels of a factor or is symmetric with respect to all levels of any one of 
the factors. From the equations (6.3) and (6.4), we have 


(6.5) NW’ = I(q) X (U — W) + E,, X W. 


Let I'(v)t(v) (where t(v) is a column vector representing v combination of the 
n factors Fas, Pmae, *** » Pman and t(v) isa (v x 1) vector) be a normalised 
contrast belonging to the interaction F,,F;,---F,, for a design in n factors. 
Similarly let I’(q)t(q) be a normalised contrast in qg levels of the factor Fy only. 
In the design N, let the variances of the estimates of the Vontrasts I'(q) x 
I'(v)t(q) @ t(v) and E’(q) X I'(v)t(q) ® t(v) be o’/A,4 and o° /@4 respectively; 
614 and 44 are canonical roots of the C-matrix of N corresponding the normalised 
contrasts belonging to the interactions FPoF;,F;,:--F;, and F;,F.,:--Fi, re- 
spectively. Then from Theroem 4.1 and Lemma 3.1, we have 


NN‘l(q) X I(v) = k(r — Ga)l(qg) XK Vr), 
NN’E(q) X I(v) = k(r — O4)E(q) X V(r). 


(6.6) 
Writing k(r — @4) = Pra and k(r — 64) = Wor, say, and substituting NN’ 
from (6.5), we can deduce that 


Ul(v) — Wi(v) = yid(v), 
Ul(v) + (q — 1)Wi(v) = pod(r). 


(6.7) 


Now let I'(v*)t(v*) be a normalised contrast belonging to an interaction 
FF j,:--Fj, in an m factor-design in F; , F;, --- , PF.» only. Let the variance of 
its estimate in the two designs N* and N%, be o’/, and o”/6, respectively; also 
let k*(r* — 6,) = y and gk*(r* — #&) = y . Then by Theorem 4.1 and Lemma 
3.1, 1(v*) is a canonical vector of N*N® and N%.N%, ; and 


(= Aa‘) I(v*) = pl(v*), 


t~1 


= (> Aes (E Aiess) 1(v*) = Wal(v*). 
=I 


i= j=l 


(6.8) 


Now, we have 


p—1 /( q ’ , 
noe = $1. deo x mH 


Hence, from (6.4) and (6.5), 


(p—1 q 
any we FE (Sand 


i Aies:)} x We+ ¥ AA. x (U — W). 


i= j=l i=! 


j=l 





512 


Therefore 


HH ’'I(»*) x I(v) = (& (> ies (3 Riess) } I(v*) x Wi(v) 


+ (= AA‘) K(v*) x (U — W)K(v). 


i=l 


Applying the results in (6.7) and (6.8), we obtain 


(6.10) HH’K*) x Iv) = 7 aC Yu — vu) + viva} Mo*) XU). 


From (6.10), it follows that l(v*) X Iv) is a canonical vector of the matrix 
HH’, hence I’(v*) X 1(v)t(v*) ® t(v) is a canonical contrast of the design H 
and its variance is 0° /@, where kk*(rr* — 0) = W2/2(Woa — Via) + Vira. Therefore 


(6.11) rr®* — @ = (r* — 62)(O14 — Boa) + (r* — &)(r — Ha). 


If the symbol L with the corresponding suffixes denotes the loss of information 
(as compared with a randomised block design ) in each case, then 


(6.12) L = Le( Loa - Lia) + Dylna - 


The contrasts belonging to an interaction of (m + n) factors can be formed 
by the Kronecker product of the contrasts in the m factors and n factors sepa- 
rately. Hence, from equations (6.10) and (6.11), it follows that the every 
contrast belonging to the same interaction F;,F;,---F;,F:,Fi,--F;, is estimated 
with the same variance o’/@, in the design H ; therefore it is a BFE. 

Thus Theorem 6.1 is proved. 

The variance of the estimate of the contrast I'(v*) & E’(v)t(v*) ® t(v) can 
be obtained from 6.11 by putting % = 0 and taking o°/@,4 as the variance of 
the estimate of a normalised contrast belonging to the main effect of F» in the 
design N. Similarly, the variance of the estimate of the contrast E’(v*) x 
I’(v)t(v*) @ t(v) can be obtained from 6.11 by putting 6, = & = 0. 

THEOREM 6.2. Let there be a BFE N, in (n + 1) factors Fo, Pinas, Fmae, 

+, Fmin Gt 8 , Smit, Smit, °°, Smon levels respectively, in b blocks of k plots each. 
Also let there be another BFE Ng in m factors F, , F2, «-+, Fm at & , 8, ***, Sm 
levels respectively, in b* blocks of k* plots each. If k* = 8 , then on substituting 8% 
levels of the factor Fy in Nq by & = k* distinct treatments of a block of Ng, we 
obtain b new blocks corresponding to each of the blocks of Ng. Then the design ob- 
tained by taking all the bb* blocks so formed is a BFE in (m + n) factors. 

Theorem 6.2 appears to be different from Theorem 6.1. However, on a close 
examination, Theorem 6.2 is seen to be a particular case of Theorem 6.1, on 
taking N = N, , N* to be a BFE in 6*k* blocks of 1 plot each and N;>, = N; 
with p = b* and g = k*. (N* is a BFE in the sense that information on every 
contrast is zero.) From this analogy, the proof of Theorem 6.2 follows exactly 
on the same lines as Theorem 6.1. 





BALANCED FACTORIAL EXPERIMENTS 513 


In Theorems 6.1 and 6.2 we have replaced the levels of the first factor Fy. 
It is known that by permuting factors and correspondingly rewriting each of the 
treatments the design remains the same; it only means that the treatments are 
given new names. Hence, in practice the replacement as in Theorems 6.1 and 
6.2 can be carried out for any intermediate factor. The proper rearrangement of 
the factors and the renaming of the treatments can be made where necessary. 

There are many BFE’s known for 3” x 2” type, but no design is available for 
3° x 2° in blocks of 6 plots each. We shall construct two such designs by the 
above method. 

EXampLe 6.1. If we take N. equal to the 3 x 2° design given in Cochran and 

x ({3], plan 6.9, p. 240), and Ny as the 3° BFE in 6 blocks of 3 plots each, 
obtained by confounding the first order interaction between the two facrors, then, 
on applying Theorem 6.2, we obtain a 3” < 2° design in 36 blocks of 6 plots each. 

EXxampLe 6.2. Similarly, if we take N. equal to the 3° x 2 design given in 
Cochran and Cox ((3], plan 6.11, p. 241), and Ng as the 2” BFE in 2 blocks of 2 
plots each, obtained by confounding the first order interaction, then, on applying 
Theorem 6.2, we obtained a 3° x 2° design in 24 blocks of 6 plots each. 

EXxamPLe 6.3. Take the design N* to be the following design in 2 x 3 in 6 
blocks of 2 plots each. 


bined Plan of the Design. 


Block Number 1 2 3 4 5 6 


Tientensiits 00 01 02 00 02 01 
11 12 10 12 11 10 


The blocks 1, 2, 3 and 4, 5, 6 form two complete replications, so we can take 
N:; as the randomised block with two replications. Now, let us take a 5 x 3 
design in 20 blocks of 3 plots each, given by Rao ((12], p. 169). Then on applying 
Theorem 6.1, we obtain a 2 x 3 x 5 BFE in 40 blocks of 6 plots each (r = 8). 


7. Acknowledgment. The author is grateful to Professor M. C. Chakrabarti 
for his help and guidance. 
REFERENCES 
[1] R. C. Boss, ‘“Mathematical theory of the symmetric factorial designs,’’ Sankhyd, Vol. 8 
(1947), pp. 107-166. 
[2] R. C. Bose ano K. Kisnen, “On the problem of confounding in general symmetric 
factorial designs,’’ Sankhyd, Vol. 5 (1940), pp. 21-26. 
{3} Wittiam G. Cocuran ano Gertrupe M. Cox, Experimental Designs, 2nd ed., John 
Wiley and Sons, New York, 1957. 
[4] R. A. Fisner, The Design of Experiments, 4th ed., Oliver and Boyd, Edinburgh, 1947. 
[5] R. A. Fisner, ‘“The theory of confounding in factorial experiments in relation to the 
theory of groups,’’ Ann. Eugenics, Vol. 11 (1942), pp. 341-353. 


[6] Oscar Kemptuorne, The design and Analysis of Experiments, John Wiley and Sons, 
New York, 1952. 





514 B. V. SHAH 


[7] Tosto KrraGawa anv Micutwo Mirome, Tables for the Designs of Factorial Experi- 
ments, Baifukan Co. Ltd., Tokyo, 1953. 
{8} Jerome C. R. L1, ‘Design and Statistical analysis of some confounded factorial experi- 
ments,’’ Jowa Agricultural Experimental Station Research Bulletin, 333 (1934). 
{9} K. R. Narr ann C. R. Rao, “Confounded designs for asymmetrical factorial experi- 
ments,’’ Science and Culture, Vol. 7 (1942), pp. 313-314. 
{10} K. R. Narr anv C. R. Rao, “Confounded designs for the k x p™ «x q” --- type of fac- 
torial experiments,’’ Science and Culture, Vol. 7 (1942), pp. 361-362. 
{11] K. R. Narr anv C. R. Rao, ‘“‘Confounding in asymmetrical factorial experiments,” J. 
Roy. Stat. Soc., Ser. B, Vol. 10 (1948), pp. 109-131. 
[12] C. R. Rao, “A general class of quasifactorial and related designs,’’ Sankhyd Vol. 17 
(1956), pp. 165-174. 
(13] B. V. Suan, “On balancing in factorial experiments,’’ Ann. Math. Stat., Vol. 29 (1958), 
pp. 766-779. 
(14) F. Yares, The Design and Analysis of Factorial Experiments, Imp. Bur. Soil Sci. 
Harpenden, England (1937). 
[15] Marvin Zuuen, “The use of group divisible designs for confounded asymmetrical fac- 
torial arrangements,’’ Ann. Math. Stat., Vol. 29 (1958) pp. 22-40. 





THE FIRST-PASSAGE MOMENTS AND THE INVARIANT MEASURE 
OF A MARKOV CHAIN 


By Joun Lamperti! 
Stanford University 


We consider an irreducible, recurrent Markov chain with transition proba- 
bility matrix P = [p,,;]. The random variables constituting the chain are {| X,}; 
let N > 0 be the smallest positive time n at which X, = 0. Then the quantities 


E{N(N —1)---(N—k+1)|Xo =i #0} = wie 
are the factorial first-passage time moments. In case i = 0, we will let uso = do. 


However, it is also convenient to introduce the actual recurrence-time moments 
for state 0: 


E\N(N —1)---(N —k+1)|Xo = 0} = wo™. 
Let {x,| be the unique positive solution of the equation 
(1) r= Dowpi, 
often called the “invariant measure’ of the chain. Then this measure and the 


first-passage moments are related by the 
TuHeorem. The equation 


(2) rope = (kK +1) > wimse, 


is always valid. ( Both sides may be + ~.) 

Remarks. If k = 0, (2) reduces to the familiar assertion that the mean recur- 
rence time of state 0 is ro Ex,. If k = 1, (2) is equivalent to a “remarkabie 
formula” discovered by Chung [1], who gave a proof rather different from that 
which follows. 

Proor oF THE THEOREM. We shall use generating functions; let 


fit’ = Pr{X, = 0, X; ¥ Oforl <n| Xo =i #0} 
=Pr{N=n|X,=i0}; fis’ =b0; Fo(z) = > fi’z". 
n=O 
Thus Fo(z) = 1, and F$?(1) = wis’ for all i including 0. Similarly we put 


go” = Pr{X, = 0, X: ¥ Ofor1 $1 < n| Xo = O} = Pr{iN = n| Xo = O}, 
and Go(x) = >-%.195"z2". Notice that gs” = >>; pofio~”, so that 


(3) Go( zx) = 2 >. piFo(z). 
Received August 3, 1959; revised November 27, 1959. 


! This research was partially supported by the Office of Naval Research. 
515 





516 JOHN LAMPERTI 


Similarly, fic? = 5°; p.f\t~” provided that i 0; this gives 
(4) Fao(z) = x > pis F (2), 
3 


The next step is to multiply by x; in (4) and sum, which yields 

3 a F (x) = Zz 2 7 4; Dis F (2) =z = (2; ar ®o Pos) F o( x) 

10 2 +0 3 
from (1). In view of (3) we have >>; #Fio(x) — ro = 20. #:Fo(2) — weGo(z), 
which, upon differentiating k + 1 times, becomes (for | z| < 1) 


(5) (Lh — ax) Dei FP (x) + 0G (x) = (k +1) YS wi FY (2). 


Relation (2) with “s”’ is immediate from (5), for, letting z — 1, 
woo? = moGo (1) S (kK +1) Dwi PL) = (k + 1) Dd wial, 
since the first term in (5) is non-negative. Now suppose that Go°*"(1) < «, 
but that 5°; xyuio) = «. Then from (5) we obtain 
(1 — x) Dw Fi? (2) 


lim - 


sofia =k +1. 
ze1- Dd mi Fis’ (x) 


It follows by Theorem 2 of [3] that 


Fare) = gym t(-4,), 


-a'“\i-e 


where L(y) is a slowly varying function.” Integrating k times, we would then 


have 
a) = 72, 4(745) 
2, 1 Falz) = —— Li -—— }, 
i l—z l-—@z 
with L, again slowly varying.’ This, however, is inconsistent with the fact that 
> :Fio(x) is bounded as z — 1. (The assumption Gi**"(1) < * excludes the 
null-recurrent case, so that > x; = S> #iFo(1) < &.) 

We have thus established that uo“*” and > ui are both finite or both 
infinite; assume the former is the case. We are through if it can be shown that 
the first term on the left in (5) tends to 0 as x — 1—. This follows at once upon 
applying to the function h(z) = >> #,F$} (x) the following simple 

Lemma. Let h(x),0 < x < 1, be a positive, monotone increasing, convex function 
with a finite limit as x — 1—. Then lim,.;— (1 — x)h'(2) = 0. This fact is obvious 
upon drawing a diagram, and this completes the proof of the theorem. 

S. Karlin has pointed out (in conversation) that the theorem can be proved 


* That is, L(cy)/L(y) ~ 1 as y — © for every c > 0. 


* This well-known fact may be easily deduced from the canonical form of the slowly 
varying function L(y) {2}. 





MARKOV CHAIN 517 


in a somewhat different manner, which avoids the use of slowly varying func- 
tions. This has its advantages, but the author confesses to a mild proprietary 
pleasure in the argument given above. 


REFERENCES 
{1] K. L. Cuune, “Contributions to the theory of Markov chains. II,’’ Trans. Amer. Math. 
Soc., Vol. 76 (1954), pp. 397-419. 
{2} M. J. Karamata, “Sur un mode de croissance réguliére,’’ Bull. Math. Soc. France, Vol. 
61 (1933), pp. 55-62. 
(3) Joun Lamperti, “An occupation-time theorem for a class of stochastic processes,” 
Trans. Amer. Math. Soc., Vol. 88 (1958), pp. 380-387. 





NOTES 


NOTE ON THE DISTRIBUTION OF LOCALLY MAXIMAL ELEMENTS 
IN A RANDOM SAMPLE! 


By MARSHALL FREIMER AND BERNARD GOLD 


Lincoln Laboratory, Massachusetts Institute of Technology 


Glasgow’s formula for the second factorial moment of this distribution [1] is 
considerably more complicated than it need be. We have elsewhere [2] and [3] 
obtained a formula requiring just one summation, over the fixed range 


0s ss h — 1, thus eliminating the summation over the ever-increasing range 
O0OsS8sSm. 

Following Glasgow’s notation, let 8 be the number of locally k-maximal 
elements in a permutation of the first n integers. Our formula, for the variance 


of 8, is 
var (8B) = (n+ 1)C,, 2k, 
where 
_— Sk + 3) So ee 
(A+ 1)(kK+1)? k++). 55k+s8+4+2° 
Using the expected value of 8 given in {1}, we find that 
E(s8”,n) = var (8) + E(8)(E(8) — 1) 


C; = 


= (n+ 1)Cy + (Qn — k +:1)(2n — 2k)/(k + 1)! 


Both referees have pointed out that Glasgow’s formula can be reduced to 
ours. In fact, the summation in his equation (3.8) can be performed, vielding 


2(m + 1)(7k + 10k + 3 + 4km + 2m) 


ik + 1k + 1)(Qk + m +2) 
REFERENCES 

[1] M. O. Guascow, ‘“‘Note on the factorial moments of the distribution of locally maximal 
elements in a random sample,’’ Ann. Math. Stat., Vol. 30 (1959), pp. 586-90. 

{2} M. Fremer, B. Goup, anp A. L. Tritrer, “On a mathematical model for a Morse code 
translator,’’ Lincoln Laboratory Group Report 34-61, November 1, 1957. (Not 
generally available.) 

[3] M. Freier, B. Goin, ann A. L. Trrrrer, ‘“The Morse distribution,’ JRE Transactions 
on Information Theory, IT-5 (1959), pp. 25-31. 


Received August 15, 1959 
' The work reported in this paper was performed by Lincoln Laboratory, a center for 


research operated by Massachusetts Institute of Technology with the joint support of the 
U.S Army, Navy, and Air Force. 


518 





RANK ORDER STATISTICS 519 


CONTRIBUTIONS TO THE THEORY OF RANK ORDER STATISTICS: 
COMPUTATION RULES FOR PROBABILITIES OF RANK ORDERS' 


By I. Ricuarp SavaGe 
University of Minnesota 


1. Introduction. For most sampling situations the computation of the non-null 
probabilities of rank orders involves either difficult multiple integrations or ex- 
tensive Monte Carlo sampling ({1}, [2], [3]). In this note back-recursive rules are 
given for computing the probabilities of rank orders for the one and two sample 
problems ({2] (Section 1), [1] (Section 2)). For the one-sample problem the rule 
permits the computations for samples of size n from the results with samples of 
n + 1. For the two-sample problem the rule permits the computations for 
samples of size m and n from the results with samples of m + 1 and n (m and 
n + 1). Since most computations done analytically are built up from smaller 
to larger sample sizes these results will, for that case, have limited value, e.g., 
in checking numerical work. For Monte Carlo sampling, however, there is no 
reason for starting with the smaller samples and in this case the rules will be of 
service. 


2. One-sample rule. Let P,(z) be the probability of the rank order 
z= (%,°*+,2n), Where z; = 0(1) if the ith smallest of the observed absolute 
values was from a negative (positive) observed deviation from a hypothetical 
median, e.g., if the observed deviations are (2.2, —.7, .5, —1.1, 3.0), then 


z = (10011). 
Rute I. To compute P,(z) add all [2(n + 1) in number] the P,.,,(2"’) and 
divide by (n + 1), where 


2? =m (Z,°°* , 8,25, °** 5 Bn), i= 0,landj = 1,---,n+1. 


Note a. Several of the z” will be the same. 

Note b. The rule can be obtained using the analytic expressions for /’,,(z) given 
in [2] (equation 1.1). Another proof can be obtained by noting that, after the 
sample of size n is formed, an additional observation must fall either between 
existing observations or before them or after them. 

Exampte I. Numerical results for the one-sample problem are not available. 


The following, however, suggests the kind of computing formulas that could be 
used. For n = 3, 


P,(010) = [P,(1010) + P,(0010) + P,(0110) 
+ P,(0010) + P,(0110) + P,(0100) + P0101) + P,.(0100))/4. 
3. Two-sample rule. Let P,,.,(z) be the probability of the rank order 
z= (%,°** , Zmin) Where z; = 0(1) if the ith smallest of the observed values 


Received August 28, 1959. 


* Work done in part under contract Nonr 2582(00), Task NR 042-200 of the Office of 
Naval Research. 





520 WADIE F. MIKHAIL 


was from the first (second) sample, e.g., if the observed values in the first sample 
were ( — 1.5, 2.6) and in the second sample (3.4, — 9), thenz = (0101). 

Rute Il. To compute P,,,.(z) add all [(m + n + 1) in number] of the 
Pmsin(2’) and divide by (m + 1), where 


2’ = (2, °°+ , 0,25, °** , Zman), J = B,?'', (+8 + 1). 
Note a. Several of the z’ will be the same. 


Note b. The roles of m and n can be interchanged in the obvious manner. 
Note c. The rule can be obtained using the analytic expression 


m+n 
Pun) = mint ff TL Uf *(wg"* (ws) dod, 
Ow Omen ® r 
where f(w)[g(w)] is the density of the first [second] population. Another proof 
can be obtained by noting that, after the samples of size m and n have been 
obtained, an additional observation from the first population must either be 
between a pair of the observations of the original m + n or before or after 
them. 
Examp te II. For the two-sample problem with m = 3 and n = 2, 


P;2(00011) = [P3,( 100011) + P;,(010011) 
+ P33(001011) + 3P3,,(000111))/3. 


Teichroew [3] gives .0394 as the exact value, and .0410 as the Monte Carlo 
value (2000 samples) when the two populations are normal with means differing 
by 4 of the common standard deviation. Using Teichroew’s [3] Monte Carlo 
results for m = 3, n = 3 (4000 samples) in the above formula, one obtains 
P;2(00011) = [.03250 + .01825 + .011875 + 3(.01675))/3 = .03992. Additional 
results form = 3,n = 2 could be obtained from m = 4,n = 2 and from m = 4, 
n= 3viam = 3,n = 3 [3]. 
REFERENCES 
{1] I. Richarp Savage, “Contributions to the theory of rank order statistics—the two- 
sample case,’’ Ann. Math. Stat., Vol. 27 (1956), pp. 590-615. 
{2} I. Ricnarp Savage, “Contributions to the theory of rank order statistics—the one- 
sample case,’’ Ann. Math. Stat., Vol. 30 (1959), pp. 1018-1023. 


[3] D. Tercuroew, “Empirical power functions for nonparametric two-sample tests for 
small samples,’’ Ann. Math. Stat., Vol. 26 (1955), pp. 340-344. 


rE 


AN INEQUALITY FOR BALANCED INCOMPLETE BLOCK DESIGNS 


By Wanie F. Mrxnarn 
University of North Carolina 


1. Summary. The inequality b = v + r — 1 for a balanced incomplete block 
design was proved by Bose [1] under the assumption of resolvability. In this note 


Received August 5, 1959; revised October 17, 1959. 





INEQUALITY FOR BIB DESIGNS 521 


the inequality is proved without that assumption, but with the weaker assump- 
tion that v = nk. 


2. Introduction. A b.i.b. design is an arrangement of v treatments in b blocks 
of size k < v such that (i) every block contains k distinct treatments, (ii) every 


treatment occurs in r blocks, and (iii) any two treatments occur together in A 
blocks. 


The parameters satisfy 
(2.1) vr = bk, 
(2.2) A(v — 1) = r(k — 1), 
(2.3) b2», r2=k. 


The last inequality is due to Fisher [2]. 

If the blocks can be partitioned into r sets of n blocks each so that in each set 
every treatment occurs exactly once, the design is called resolvable. Obviously 
then v = nk and b = nr, but the converse need not hold. Bose [1] proved that 
if a resolvable design with parameters »v, b, r, k, \ exists, then b 2 v + r — 1. 


3. Theorem. /f a b.i.b. design with parameters v = nk, b, r, k, exists, then 
(3.1) b2ev+r—1. 


Proor: Obviously v > k implies n 2 2. 

We first prove that r > k. Since r 2 k, assume on the contrary that r = k. 
Then from (2.2), (nk — 1) = k(k — 1). Hence n\ = (k — 1) + X/k. Since 
nd is an integer, \/k is an integer, which is a contradiction since, from (2.2), 
\ <r = k. Hence we have 


(3.2) r>k. 


The inequality b 2 v + r — 1, under the assumption that v = nk, is equivalent 
to 


(3.3) r= (nk — 1)/(n — 1) 
since n — 1 2 1 is positive. Further, from (2.2), we have 
(3.4) r = (nk — 1)/(k — 1), 


n = (r(k — 1) + A)/Ak, 
and 
(3.6) (k — 1)/(m — 1) = Ak/(r — Xd). 
From (3.3), (3.4), (3.6) we have 
(3.7) r—-A 2k. 


It is therefore sufficient to prove (3.7). 





§22 WADIE F. MIKHAIL 


Assume that the contrary is true, ie., k > r — X. Put 
(3.8) AN=r—-k+i 
where 1 < i S k — 1, since \ < r. Substituting in (3.5), we get 

n = (rk — k + i)/(rk — kh + ik). 
From (3.2), we put r = k + j, where j is an integer > 1, and obtain 
‘ie i 

3.9 = +>. —, < 
(3.9) n jeitjaitmen 
Consider (3.9) and assume that 7 + i divides k. Then, since (j — 1)/(j + ¢) and 
i/k(j + 1) are both positive proper fractions and n is an integer, we must have 
(i — 1)/(j + i)] + (¢/k(j + i)) = 1, which implies that i = —k/(k — 1) < 0. 
This is a contradiction since i = 1. 


Now assume that 7 + i does not divide k. Then, if k < 7 + i, all the terms 
on the right hand side of (3.9) are positive proper fractions and 


(3.10) 1st Sige BSD 


j+i kG+) kth 


which is <1 since k, j, ¢ are all positive. Hence, since n is an integer, the only 
possibility is that n = 1, which is a contradiction. 

Now assume that k > j + 7 is not divisible by j + 1. Then k = m (mod j + 1), 
where 1 S m Sj +1 — 1. In this case (3.9) gives 


m j-1 t 
3.11 a a. oa? + ;. x~ = l 
aa) cy ee ry K(j + 1) 
Since all the terms are positive proper fractions and the sum of the last two terms 
is also a positive proper fraction, (3.11) givesi = m — 1 + [(m — 1)/(k — 1)}, 
which is a contradiction since i is an integer and k > 7 + i > m implies 


m-1>k-—1. 
Thus (3.8) is contradicted in all cases. Hence obviously r — \ 2 k in all cases. 


This completes the proof. 


4. Acknowledgment. I am grateful to Dr. 8. 8. Shrikhande for kindly going 
through the proof and making suggestions. 


REFERENCES 


{1] R. C. Boss, “A note on the resolvability of B.I.B.D.”’, Sankhya, Vol. 6 (1942), pp. 
105-120. 

[2] R. A. Fisuer, ‘An examination of the different possible solutions of a problem in incom- 
plete blocks’’, Ann. Eugenics, Vol. 10 (1940), pp. 52-75. 





ACKNOWLEDGMENT OF PRIORITY 
By G. P. Srecx 


Two of the examples given in my note, “‘A Uniqueness Property Not Enjoyed 
by the Normal Distribution,” Ann. Math. Stat., Vol. 29 (1958), pp. 604-606, 
of non-normal distributions where the quotient follows the Cauchy law had 
already been given, as consequences of a general result, by J. G. Mauldon, 
“Characterizing Properties of Statistical Distributions,’ Quart. J. Math., Ser. 
2, Oxford, Vol. 7 (1956), pp. 155-160. 





ABSTRACTS OF PAPERS 


(Abstract of a paper presented at the Washington, D.C., Annual Meeting of 
the Institute, December 27-86, 1959.) 


76. Moment Generating Functions of Quadratic Forms of Normal Order 
Statistics. Hanoy Rusen, Columbia University. 


A general method is derived for obtaining the joint moment generating functions of an 
arbitrary set of quadratic functions, not necessarily definite positive, of order statistics 
in normal samples. This class of functions probably includes all or most functions of order 
statistics likely to be of practical interest, e.g., squared linear functions used in censored 
samples and other applications, squared range, squared subrange, squared deviation of 
extremes from the sample mean, etc. The determination of the generating functions reduces 
to the classic problem of the evaluation of the contents of hyperspherical simplices (the 
generalization of the circular are and spherical triangle). 


(Abstracts of papers presented at the Lafayette, Indiana Meeting of 
the Institute, April 7-9, 1960.) 


1. Note on Significances of Differences for Attributes. Ir: inc W. Burr, 
Purdue University. 


Assuming equal sample sizes and either a Poisson or binomial population, the maximum 
likelihood estimate of the parameter is used. Then the exact probability of a difference 
in “‘defects’’ or “‘defectives’’ at least as large as was observed is obtained by double sum- 
mation. This probability then gives the exact significance levels for various differences and 
sample sizes. A table gives these results up till when the normal curve approximation takes 
over accurately. A quick and accurate approximation for unequal samples is indicated. 


2. A Characterization of Some Location and Scale Parameter Families. Sup- 
nish G. Guurye, Northwestern University. (By title) 


Zinger (Vestnik. Leningrad. Univ., Vol. 1 (1956), pp. 53-56) has proved the following 
result: Let X; ,--- , X, ,n 2 6, be independent random variables having a common distri- 
bution, which is of continuous type: let ((X) = (1/n)=X; , s(X) = [2X7 — nt*(X)} and 
Y; = (X,; — t(X)]/s(X). If the Y; are distributed uniformly on the (n-2)-dimensional 
sphere {Zy,; = 0, Lyi = 1}, then the X-distribution is normal. I extend this result in an 
obvious way to characterize the exponential and rectangular distributions, and also the 
multi-variate normal and Wishart distributions. The following result is proved incidentally : 
Let f(x) be a measurable function of real z, having the property thatr + y+z2=a+b+c 
and 2? + y? + 2? = a? + b? + c*imply f(z)f(y)f(z) = f(a)f(b)f(c). If f(z) #0 for two values 
of z, then there exist numbers a, 8, y such that f(z) = a exp (8z + yz") for all z. 


3. A New Class of Sequential Decision Rules for Symmetric Problems. WiL- 
LIAM JACKSON Haut, University of North Carolina. (By title) 


A class of sequential tests is derived for choosing between two symmetric hypotheses 
with equal preassigned error probabilities. The class includes the Wald sequential prob- 
ability ratio test (SPRT) and numerous other sequential tests. For a number of problems — 


524 





ABSTRACTS 525 


including tests on the mean of a normal distribution and a variety of two-population 
problems—there is one or more test available with ‘converging boundaries’’ (bounded 
sample siz) in contrast to the “‘parallel boundaries’’ (unbounded sample size) of the 
SPRT. The relative merits of such tests are investigated, and some extensions to multiple- 
decision problems are discussed. 


4. Normal Approximation to the Distribution of Two Independent Binomials, 
Conditional on Fixed Sum. J. Hannan, Michigan State University and 
W. Harkness, Pennsylvania State University. (By title) 


For i = 1, 2, let k; be independent binomials with parameters (N; — 1, p;) and let 
Se = Pr (hk, = k| Dk; = ec}. 


Theorem: With (P, , P:) defined by P2q2/Qep: = Piqgi/Qip; and ZN,;P; = ¢ + 1, and with 
H? = Z(N;P,Q;)"' and X; = H(k — NiP; + 4), fe ~ Ho(Xx), Li fe ~ ©(Xg.4) — &(X,.4), 
Dife < Xz'o(X.) or ~1 — ©(X..4), as H and, respectively, HX, HX? and HX), HX, 
or HX; 0. 


5. On the Analysis of Split-Plot Experiments. H. Leon Harter, Wright- 
Patterson Air Force Base, Ohio. 


A crucial question in the analysis of split-plot experiments is whether or not the inter- 
action between subplot treatments and replications should be pooled with the three-factor 
interaction of main plot treatments, subplot treatments, and replications, the result being 
called subplot error. A brief history of the controversy over this question is given, along 
with a rule for deciding, on the basis of a preliminary test of significance, whether or not 
to pool. Several numerical examples are cited, and one of these is worked out in detail. 


6. An Extension of a Theoretical Gene Model to Provide for Genic-environ- 
mental Interaction Terms. Ceci, L. Kauier and Virew L. ANDERSON, 
Purdue University. 


A statistical model for the study of quantitative inheritance was introduced by Anderson 
(1953) by utilizing the techniques of factorial experimental models. Kempthorne (1954) 
pursued this further by developing the general gene model in which he used the symbol 
IIj-: Ai,Ai., to denote the genotype of an individual from a population G whose members 
are diploid and have N loci G; , j = 1, 2, --- , N, where locus G; has available h, alleles 
Ai,, i; = 1,2,---, hy, with respective relative frequencies i. oe phy; where 
Di. pi; = 1. By use of algebraic identities and identification of resultant terms with 
genetic effects, Kempthorne provided a complete theoretical gene model. The extension 
of these developments to a general phenotypic model is accomplished by introducing en- 
vironmental factors E,,r = 1, 2, --- , M, where E, has K, “levels” E!, E?,---, z. 
with associated occurrence frequencies p!, p?,--- , pe’, where y aA p: = 1. Then the 
phenotype of any diploid individual is denoted by a symbolic product of genotypic and 
environmental components as P{IT3i,i7™.iviy which is expanded by use of identities and 
symbolic algebraic multiplications into a sum of uncorrelated terms which account for 
all genetic effects, all environmental effects, and all genetic-environmental interaction 
effects contributing to the phenotypic expression of the individual. This is the theoretica! 
genic-environmental interaction model. 





526 ABSTRACTS 


7. Comparison of Estimators for Some Generalized Poisson Distributions. 
S. K. Kartt, Florida State University, and Joun Gurianp, Iowa State 
University. 


For the generalized Poisson distributions, Neyman Type A and Poisson generalized 
Pascal, the well-known asymptotically efficient methods of estimation yield highly cumber- 
some equations to solve. In view of this, certain methods have been studied for these distri- 
butions from the point of view of obtaining simple estimators and the joint asymptotic 
efficiency of the estimators evaluated. In the case of Neyman Type A, it is found 
that the estimates of the two parameters obtained by minimising the quadratic form 
(t — r)h-(t — +r)’, where t = (fq; , 2) , log Py), r = (am) , mm , log Po), and & is a con- 
sistent estimate of the co-varnace matrix of t, have remarkably high efficiency in a wide 
region of the parameter space. For the three parameter Poisson generalized Pascal distri- 
bution, the method of using the first two moments and the ratio of the first two frequencies 
looks promising. 


8. Generalization of Thompson’s distribution III. ANpre G. Laurent, Wayne 
State University. 


Let the p X N matrix X = (X',--- , X*,--- , X%) be a sample of N vectors X‘ with 
distribution N(BZ‘, 2), i = 1 to N where B is p X q and Z = (Z!,--- , Z, --- , Z¥) is 
q X N of rank q. Let — be a subsample of k vectors, q S k S N — p — q, Z; the correspond- 
ing (Z',--- Z*). Let B, 3, B, , &¢ be the M.L. estimates of B, = obtained with X and ¢ 
respectively. The conditional distribution of £, given the sufficient statistics, B, 3 is 


C\l — (k| N)E7% 
— (1/N)2—°(B, — B)[(Z,.Zi)" — (2Z’)-) > (B, — By |r e-e-vy NE | #2 de 


with C = | ZZ’ |-»/?| ZZ’ — ZZ |-9/te-korTT? r[(N — g + 1 — i)/2)/TI(N —k —q+ 
1 — i)/2] in the proper domain. The conditional distribution of the “‘studentized”’ variable 
n = (N3)-4(¢ — BZ,) is, in the proper domain 


C | 1 — kBy — qwelwe(l — wiwe)we)-hogn’ |-*--2-D dy 


where w; = (ZZ’)~4Z; ; » is independent from 2, B. Formulae simplify when at least one 
of p, q, k, is unity. Applications to estimation problems are given. 


9. An Expansion for the Quadrivariate Normal Integral when p,; = py = 
pu = 0. J. A. McFappen, Purdue University. (By title) (Introduced 
by J. H. Abbott) 


Let & , & , &: , and & obey a quadrivariate normal distribution with all mean values equal 
to zero. Let the correlation coefficient between & and £; be p;;, and let pi; = 0 
when |i — j| > 1. The value of the quadrivariate normal integral, i.e., the probability 
that & , & , &, and £ are simultaneously positive, is equal to (q'5){1 + (2/x)[sin=* pis + 
sin! pes + sin pa] + W(pi2 , p23 , pxs)}, where 


W (pi2 , p22 » ps) = (4/x*)pr2puDo (4) mers (m!) Gn (012) Gm (pas); 


G(x) = F (4,4 +m; $; 2%); (a)m = a(a +1) --- (€ + m—1); (@)o = 1. Go(z) is expressible 
in terms of an aresine function, and the other G,, (z) can be written as products of (1 — z*)!-" 
and polynomials of degree m in z*; thus the series is well suited for computation. Numerical 
values from the first four terms compare well with known, exact values of the quadrivariate 
normal integral. 





ABSTRACTS 527 


10. An Expansion for the Quadrivariate Normal Integral for a Stationary 
Markov Process. J. A. McFappen, Purdue University. (Introduced by 
J. H. Abbott) 


Let £ , & , &, and & be successive measurements from a stationary Gaussian Markov 
process, with the mean value equal to zero. Let the correlation coefficient between £; and 
&; be pi; . The value of the quadrivariate normal integral, i.e., the probability that & , & , 
&; , and & are simultaneously positive, is equal to (¢¢){1 + (2/r)[sin~' pis + sin™ ps + 
sin~* pa + sin~* (pr2p23) + sin~' (prs) + sin (prspssess)] + Worn, om, on)|, where 
W (pie , p22, pus) = (4/«*) pispas De (4) m(—}) el (4) me) 0 29® (mm!) Pm (012) Fm (pase) ; F(z) = 
F(a, 4 — m;  — m; 2); @)n = afa + 1) --+ (@ + m — 1); (do = 1. Fo(z) is 
expressible in terms of an arcsine function, and the other F,,(z) can be written as products 
of (1 — z*)' and polynomials of degree m in z*; thus the series is well suited for computation. 
Numerical values from the first four terms compare well with known results obtained by 
numerical integration. 


11. On Evaluation of Negative Binomial Distribution Function. G. P. Partin, 
University of Michigan. (By title) 

In this paper, we show that in order to evaluate the negative binomial distribution 
function Y(r, p, k) = Din (* .- ‘) pil — p)* whereO0 S p35 1,0< k < «@, we 
can use (positive) binomial distribution function tables, when k is a positive integer. To 
be more general, we show that we can use the incomplete beta function tables for any 
general &. Thus, we indicate that there is no necessity as such of having numerical ta- 
bles for the negative binomial distribution function, since extensive tables are available 


for binomial distribution function and incomplete beta function. To be precise we estab- 
lish Theorem 1: Y(r, p, k) = 1 — Bik — 1, p,r +k), k = 1,2,3,-++ , where 


3 = y z _ ons. 
Bic, p, n) ~, (")> a p) 


Also Theorem 2: Y(r, p, k) = I,(k, r+ 1),0 <k < &, where 


P 
I, (m,n) = 1/B(m, n) f u™—(1 — u)*-* du. 


Incidentally, one gets from the above the well-established identity between the binomial 
distribution function and the incomplete beta function, namely B(k — 1, p, r + k) = 
1 — I,(k,r +1) = Lyp(k, rr” 1). 


12. On Some Extensions of Sampling with Probability Proportional to Size. 
D. K. Ray-Cuaupuvrt, Case Institute of Technology. 


Consider a finite population II consisting of N units U; , U2, --- Uy . Let Y denote the 
variate under inquiry and X denote an auxiliary variate related to Y. Let X; (X,; > 0) 
denote the value of X for U; which is assumed to be known, i = 1, 2, --- N. Sampling with 
probability proportional to size (PPS) is an efficient method of utilizing the supplementary 
information provided by X for the purpose of estimating Y, the population mean of Y 
only if Y is approximately proportional to X in a certain sense. Several extensions of PPS 
sampling have been obtained which give efficient ways of utilizing the supplementary in- 
formation provided by X even when Y is approximately any linear function of X. A derived 
unit W,, is cefined to be a pair of original units (U; , U;) where X; > X and X, < X and X 
denotes the mean of X. In one of the sampling schemes considered a number of derived 





528 ABSTRACTS 


units is selected with probability proportional to | X, — X | + | X¥; — X |. These exten 
sions of PPS sampling are compared with other sampling schemes and the method is gen 
eralized to the case when U is approximately a quadratic function of X. 


13. Application to Stochastic Processes of a Uniqueness Property of the Rec- 
tangular Distribution. Herman Rvusin, Michigan State University. 


If the random variable X has a (th moment u(t) for all ¢ e(—1, 1), and for all t e(—1,0), 
we have (t + 1) w(t) = —tu(t + 1), then ¥/(1 + X) is rectangular (0, 1). This can be 
shown by observing that u(t) sin rt/xt is periodic of period 1 and bounded in the strip 
| R(t)| S4by A+ B| sin xt | , and hence is constant. Let Y be a process with independent 
increments, stationary on both sides of a value a. If Z(A) is the likelihood ratio fora = X 
against a = 0 and is positive almost surely, and a = 0, then f, Z(A) CN Z (dr) dd is 
rectangular (0, 1). This follows from the recursion formulas for the moments (of all orders 
less than 1) of f, Z(A) dd and a Z(d) dd, and an application of the preceding theorem. 


14. Test for Regression Coefficients when Errors are Correlated. M. M. Sip- 
piqu1, National Bureau of Standards, Boulder, Colorado. 


In a previous paper (Ann. Math. Stat., Vol. 29 (1958), pp. 1251-56) the variances and 
covariances of least-squares estimates of regression coefficients were obtained when the 
errors are assumed to be correlated. In this paper it is shown that the usual test statistic 
for a regression coefficient is approximately distributed as ct, where c is a constant and t 
is a Student variate with h degrees of freedom. h is a number determined by the covariance 
matrix of errors. 


15. Joint Distribution of Medians in Samples from a Bivariate Population, 
M. M. Srppieui1, National Bureau of Standards, Boulder, Colorado. 
(By title) 


Let F(z, y) be the joint distribution function of (XY, Y), possessing a pdf f(z, y). A ran- 
dom sample (X; , Y;), i = 1, 2, --- , n is drawn, n odd. Let X_ and Y> denote the medians 
of sets X,,--- ,X, and Y,,--- , ¥Y, , respectively. The joint distribution of (X» , Y,) is 
obtained and it is shown that it tends to N(t, =) as n — «© where — = (¢ fs) 


Sl» S§2/>5 


> 
== ( ey . *) . Here & and £ are the medians of the marginal pdf’s f;(z) and f.(y) of 
poor 


X and Y respectively, 4nfi(£,)oi = 1, 4nft(é2)o3 = 1, and p* = 40 — 1, where @ = F(£, , &). 
As a corollary it is shown that (F(X, , Yo) is asymptotically normaliy distributed with 
mean @ and variance c/n where c is constant depending on the parameters of F. Generaliza- 
tion to the distribution of the median vector in samples from multivariate populations is 
obvious. 


16. A Characterization of the Uniform Distribution in Compact Topological 
Groups. James H. Srap.etron, Michigan State University. 


Let T be a connected compact topological group with a countable basis. Let X, , X2 , ; 
X, be independently and uniformly distributed (1.U.D.) in Tf (the distribution of the n- 
tuple is the Haar measure in I’). Define Y; = Diu ai; X; (i = 1,--- , n) for integers 
a;; . Then the Y; are I.U.D. if and only if (a;;) is non-singular. In a sense this characterizes 
the uniform distribution in . Let X, , --- , X, be independent, and suppose that for no j 





ABSTRACTS 529 


does X; take all its values in a fixed coset of a proper compact subgroup of ©. Let Y, , «++, 
Y, be as before, assume at least two a;; are non-zero for each i, and let det(a;;) = +1. 
Then, if Y, ,--- , Y, are independent, each X; which has an absolutely continuous part 
with respect to the Haar measure is uniformly distributed in . The proofs make use of the 
theory of characteristic functions for compact topological groups. 


17. Some Results in the Analysis of Variance I. (Preliminary Report) Serie 
Srarr, George Washington University. (By title) 


Using the finite model (Model III) for the nested case, it is shown that the expected 
values of certain quadratic forms in the observations can be expressed simply in terms of 
the same quadratic forms in the population values. The usual expected mean squares are 
obtained as an immediate consequence. Using Model III for a complete n-factor asymmetri - 
cal factorial, without replication, it is shown that the usual mean squares based on observa- 
tions can be expressed in terms of the sums of squares of all possible 2* factorials that can 
be formed from the array. This result is used to develop the usual expected mean squares. 
The factorial with replication is then derived by a combination of the two foregoing results. 
The approach in both cases uses only the simplest combinatorial considerations and does 
not involve the expectations of cross-products usually encountered. Matrix algebra simpli- 
fies the presentation and, in the case of the factorial, leads to the Kronecker product of n 
simple 2 X 2 matrices. The results are proved rigorously, by induction, for the general case. 
It is then shown that the development by the usual linear models is a natural consequence. 


18. Power of Some Two Sample Distribution Free Tests. B. V. SukHatme, 
Michigan State University. (By title) 


A two sample distribution free test based on the number of observations of one sample 
lying outside the extreme values of the other sample was first proposed by Wilks (1942) and 
its probability distribution was later tabulated by Rosenbaum (1953). Kamat (1956) pro- 
posed another two sample distribution free test based on the numbers of observations of 
each of the two samples lying outside the extreme values of the other sample. This paper 
gives the exact distributions of the two test statistics both under the hypothesis and the 
alternative. These results are used to compare the power of these two tests against scalar 
alternatives for small samples from normal population for different levels of significance. 
A discussion is also given concerning the relative efficiency of these tests with respect to the 
variance ratio F test. 


19. Nonparametric Tests for Location and Scale Parameters in a Mixed Model 
with Discrete and Continuous Variables. SHasuHikaLa B. SuKHATME, 
Michigan State University. (By title) (Introduced by B. V. Sukhatme) 


Let Z; ,Z:,-++ , Zy with Z; = (X,, Y,) be independent observations from a bivariate 
population. Let the random variable X assume two values 1 and 0 with probabilities p and 
1 — p respectively. Let P(Y s y| X = j) = F;(y), Gj = 0,1). This paper considers the 
problem of testing the hypothesis H: F; = F, against the alternative A: F; ~ F, . Several 
nonparametric tests for location (e.g. two sample median and Wilcoxon tests, etc.) and for 
dispersion (e.g. rank test) have been proposed and their asymptotic properties investigated 
in the case when p is known. In the case when p is unknown, the test statistics are modified 
by replacing p by its usual estimator and it has been proved that some of the tests based 
on the modified statistics are asymptotically distribution free. The generalisation to the 
case when the random variable X has a multinomial distribution is also considered. 





530 ABSTRACTS 


20. Efficiencies of Estimators of Scale and Location Parameters Constructed 
From Order Statistics of Censored Samples. J. A. Tiscuenporr, Bell 
Telephone Laboratories. (Invited Paper) 


Estimators of the location and/or scale parameters of distribution functions with p.d.f.’s 
of the form g(z) = o~ [f(z — m)/c] are constructed from k order statistics where the sample 
size n is large. The order statistics are the sample quantiles corresponding to the specified 
constants 0 < »}; < «+: < Ax < 1. The estimators are unbiased, linear and of minimum 
variance for the particular set of \;’s,i = 1, --- , k. Necessary conditions for an optimum 
spacing of A, , --- , Ax are given for distributions satisfying certain continuity and differen- 
tiability conditions. This optimum spacing may be approximated by a relatively simple, 
graphical procedure in each of the three cases, estimating the location parameter, the scale 
parameter, or both parameters. Upper bounds on the efficiencies of these estimators are 
obtained. These bounds may be interpreted with respect to the ordered sample in such a 
way as to also yield upper bounds on the efficiencies of such estimators when the large 
sample is a censored one. Interesting comparisons of estimation situations can be made 
for the case where time is the random variable, i.e., censoring is on the right. 


21. Some New Single Level Continuous Sampling Plans. (Preliminary Report) 
Joun 8. Wuire, Aero Division, Minneapolis-Honeywell Reg. Co. (By 
title) 


Generalizing the methods of Dodge (Ann. Math. Stat. Vol. 14 (1943) pp. 264-279 and Ind. 
Qual. Cont. Vol. 7, pp. 7-11) some new single level continuous sampling plans are given. The 
procedure for these plans is as follows: (a) At the outset inspect, in succession, 100% of the 
units produced until i units in succession are found clear of defects. (b) When i successive 
units are found clear of defects, discontinue 100% inspection, and inspect only a fraction 
f of the units. (c) When a defect is found, revert to 100% inspection until either a second 
defect is found or until m successive units have been found clear of defects. (1) If a defect 
is found before m successive units have passed inspection, revert to 100% inspection as 
per (a). (2) If no defect is found, revert to sampling inspection at rate f. (i) If a defect is 
found in the next k sample units inspected, revert to 100% inspection as per (a). (ii) If no 
defect is found in the next & sample units, continue sampling until a defect is found and 
then proceed as in (c). Tables have been computed giving AOQL, i and f values correspond- 
ing to various values of k and m. 


22. Existence of Wald’s Sequential Test in the General Case. Roper A. 
WusMman, University of Illinois. 


A sequential probability ratio test (SPRT) for choosing between two hypotheses H; , 
i = 1, 2, is defined by the acceptance intervals 7; . Let u = sup /, ,» = inf J,,u S v. In 
order to cope with discrete distributions, define a randomized SPRT R(s, t), with error 
probability vector a(s, t),s = (u,A),¢ = (v,4),0 SA, 4 3S 1, as follows: If u < v, u is 
included in or exciuded from /, with probabilities \ and 1 — \; v is included in or excluded 
from J, with probabilities 1 — » and yw. If u = v, then w 2 d and u is included in /, , in J; , 
or in neither, with probabilities 1 — uw, A, and » — ». The existence proof in the continuous 
case (Ann. Math. Stat. Vol. 29 (1958) p. 938) remains formally valid in the general! case if 
s and ¢ are considered elements of a space Z of points z = (z,y),0<2< «“,0S5 y S81, 
with the points (0, 1) and («,0) added. Define a linear ordering: z; < 2: if z; < 2: or % = 
z, and y; < y: . A topology for Z is generated by sets of form z < z; ,orz > 22. If f is con- 
tinuous on Z, if a, b e Z andc is a number between f(a) and f(b), then f(z) = ¢ for some z, 





ABSTRACTS 531 


a 2 & b. This is applied to the functions a;(s, t) for fixed s or t, and a;(s, 8), which are 
continuous and monotonic on Z. Let C = {a(s, 8): 8 ¢ Z} and let A be the closed set in the 
a-plane bounded by C and the coordinate axes. Then, if a* 2 A, there is no solution to 
a(s,t) = a*, and if a* ¢ A there is an essentially unique solution. This solution has opti- 
mum property, hence is admissible, provided u S$ 1  v. In any case, the solution has opti- 
mum property among all solutions which take at least one observation. For any a* with 
a; + a, S 1 there is a test with error probability vector a*, possessing optimum property, 
in the form of a mixture of R, , R, and R(s, t) for some s,t withu S$ 1 S v, where R; accepts 
H, without any observation. 


(Abstracts of papers presented at the Eastern Regional Meeting, Columbia 
University, April 21-28, 1960.) 


1. Transition Probabilities for Telephone Traffic. V. E. Benes, Bell Telephone 
Laboratories and Dartmouth College. (By title) 


A stochastic process N (t), representing the number (out of a total NV) of telephone trunks 
that are in use, is defined by the conditions that arrivals form a renewal process, and that 
holding-times of calls have a negative exponential distribution. The transition probabilities 
of the (not necessarily Markov) process N(t) are determined in terms of their Laplace 
transforms (i) by augmenting N (t) to be a suitable Markov process, and (ii) directly by 
using the regeneration points of N (t). The practical relevance of the transition probabilities 
to traffic measurement are described. 


2. Efficient Sequential Estimators With High Precision Only in a Small 
Interval. ALLAN Binnsaum, New York University. 


The requirement that an estimator @* = @*(z) of a real-valued parameter @ have high 
precision in a small interval [@, , 6.) can be formulated in part thus: The probability that 
6* be closer to 6, than to 6 when @, is true, and the probability that 6* be closer to @, than 
to 6, when 6, is true, should equal! or exceed specified lower bounds 1 — a, 1 — 8 respectively. 
In many problems such specifications cannot be met by an estimator based on a single 
observation. If sequential sampling is allowed, these requirements can be met most effi- 
ciently, in the sense of minimizing the expected sample size under all values of 6, by use of 
the sampling rule of Wald’s sequential probability ratio test of 6; against @, at strength 
(a, 8) under general conditions met in common examples. On the resulting sample space 
{z}, the stated requirements are met efficiently by every estimator which takes values 
exceeding 0’ = (6, + 6,)/2 on points z where the corresponding sequential test would reject 
6, , and values less than 6’ on other points. The definition of the estimator can be completed 
to make it admissible. The description of such estimators is simple when there is no “excess 
at termination’’ (or when excess is ignored): let t = t(z) = 1/nif zis a “rejection’’ point 
based on n observations, let 1 = —1/nif zis an “acceptance” point based on n observations, 
and let 6* be any monotone function of t meeting the above condition. In problems with 
suitable symmetry, @* can be determined thus so as to be a median-unbiased admissible 
estimator. Admissibility is proved by noting that the class of estimators first described are 
(sequential) Bayes solutions, and by determining within this class a unique Bayes solution 
for another a priori distribution. 


3. Partially Balanced Arrays. 1. M. Cuakravart, University of North Carolina 
and Indian Statistical Institute. (Introduced by David B. Duncan) 


Earlier (1956) this author had defined partially balanced arrays as follows: An array 
involving n factors F; , F;,--- , FP. , each at s levels such that for any group of d factors 





532 ABSTRACTS 


(d S n), a combination of levels of d factors, Fii, , Pog , «++ , Faig , OCCUTS As; ig---4g times 
where \¢;é9---¢g remains the same for all permutations of a given set (i; , i: ,--- 14) of levels 
and for all groups of d factors chosen out of n, i; ranging from 0 to (s — 1) for all j. Then 
it can be easily shown that this property also holds for any k < d. Examples of partially 
balanced arrays are given. These arrays require less number of assemblies than the corre- 
sponding orthogonal arrays for estimating the effects of interest; but the estimates are not 
mutually uncorrelated. For s = 2, it is shown that a class of partially balanced arrays are 
derivable from the well-known (\-u-») configurations. 


4. Extensions of the Poisson and the Negative Binomial Distribution. A. 
Cuiirrorp CouEn, Jr., The University of Georgia. 


In biological studies which involve fitting the Poisson or the negative binomial distribu- 
tion to counts of organisms, considerable disparity is often encountered between observed 
and expected frequencies in the zero class. This paper concerns the addition of a selection 
parameter to these distributions in order to alleviate this difficulty. Maximum likelihood 
estimators of the original and the added selection parameters are derived. Asymptotic 
variances and covariances are given and illustrative examples are included. 


5. Asymptotic Variance as an Approximation to Expected Loss for Maximum 
Likelihood Estimates. Witu1am D. Commins, Jr., Alexandria, Va. 


From bounded estimation loss functions which are approximately parabolic when the 
estimate is near the parameter 6, Chernoff (Ann. Math. Stat., Vol. 27, pp. 1-22) defines a 
normalized loss function. For an estimate based on a large number n of observations, the 
normalized expected loss is generally sandwiched between the variance o7(@) of the asymp- 
totic distribution of the estimate (the asymptotic variance) and the expected squared error 
(normalized). This paper is a proof under suitable restrictions that, for the maximum likeli- 
hood estimate 7’, the normalized expected loss converges to the asymptotic variance, which 
can be smaller than the limit of expected squared error (normalized). The proof of the con- 
jecture resolves into a proof that lim,., nP(| T, — @| > K) = 0 forany K > 0 and lim,.,, 
Sir,-tigx n(T, — 0)? dP = o*(@) for small K > 0. The proof that the first limit holds is a 
modification of Wald’s proof (Ann. Math. Stat., Vol. 20, pp. 595-601) that the maximum 
likelihood estimate is consistent. The analysis of the integral involves a modification of the 
standard proof that the maximum likelihood estimate is asymptotically normal. The multi- 
parameter case is treated separately but analogously. 


6. Multi-Stage Bayesian Lot-by-lot Sampling Inspection. Hersert B. E1sen- 
BERG, System Development Corp, (Introduced by Herbert T. David) 


Based on the work of Arrow, Blackwell, and Girshick (Econometrica, 1949), this paper 
develops the theory for constructing Bayesian multi-stage (that is, single, double, multiple, 
and sequential) attribute sampling plans for finite lot size, arbitrary profit function, and 
arbitrary a priori lot quality distribution. Using a linear profit function, this theory is 
applied to the following a priori lot quality distributions: binomial, two-point, degenerate 
one-point, discrete mixed binomial, and continuous mixed binomial. Parametric and dis- 
tributional conditions under which sampling never pays are discussed. Profit efficiencies of 
single, double, and multiple sampling plans relative to sequential plans can be computed. 
Effect of optimizing with respect to the wrong lot quality distribution is considered. 





ABSTRACTS 533 


7. A Representation of the Bivariate Cauchy Distribution. Tuomas 8. Fer- 
euson, U. C. L. A. and Princeton University. (By title) 


A pair of random variables, (X, Y), is said to have a bivariate Cauchy distribution if 
every linear combination, aX + bY, has a (one-dimensional) Cauchy distribution (possibly 
degenerate). The main theorem proved is the following: A function, ¥ (u, v), is the logarithm 
of the characteristic function of a bivariate Cauchy distribution if, and only if, ¥(u, v) = 
tau + ibv — g(u, v) where, (1) a and b are real numbers, (2) g(u, v) is a real, non-negative, 
and positive-homogenous function of degree one (i.e. g(tu, tv) = | t| g(u, v) for all real values 
of t, u and v), and (3) the set {(u, v): g(u, v) S 1} is convex. The relation between this 
representation and that found in Levy’s book, Theorie de l’addition des variables aleatoires, 
1937, is discussed. It is shown that this theorem does not extend directly to higher dimen- 
sions: namely, that in three dimensions, there are convex sets symmetric with respect to the 
origin which cannot be obtained as a set, { (u,v, w): g(u,v,w) S 1}. 


8. A Noiseless Comma-free Coding Theorem. Tuomas 8. Fercuson, UCLA 
and Princeton University. 


A unique feature of noiseless coding theorems is unique decipherability of the code. 
However, this decipherability is unique only when one knows on which of the members of 
the infinite sequence of incoming symbols new words start. Ordinarily, wrong guesses as to 
the starting position may be detected only through the ‘‘nonsense”’ of the decoded sequence. 
This drawback may be avoided through the use of so-called comma-free codes (Golomb, 
Gordon, and Welch, Can. Jour. Math. 1958). It is shown that one can achieve the same 
asymptotic average transmission rate under the stronger restriction that the code be 
comma-free and uniquely decipherable. 


9. Inference About Non-Stationary Markov Chains, Rurn Z. Goip, Columbia 
University. 


Extending results of Anderson and Goodman (Ann. Math. Stat., Vol. 28 (1957), pp. 89- 
110), we consider N (large) observations taken at times 0, 1, 2, --- , 7’ on a finite non-sta- 
tionary Markov chain in which the transition probabilities are specified functions of a set 
of unknown parameters. By methods analogous to those of Neyman (‘‘Contribution to the 
theory of the x?-test,’’ Proceedings of the Berkeley Symposium on Mathematical Statistics and 
Probability, University of California Press, Berkeley, 1949, pp. 239-274), best asymptoti- 
cally normal estimates and tests of hypotheses are derived for these parameters. We also 
show that certain x’ expressions arising in Markov chains with arbitrary transition proba- 
bilities ean be decomposed into a sum of squares of asymptotically independent normal 
variables with 0 means and unit variances after the manner of “‘partitioning’’ proposed by 
Lancaster (Biometrika, Vol. 36 (1949), pp. 117-129) despite the fact that in Markov chains 
the number corresponding to the number of observations in a contingency table is a random 
variable. A method of finding joint asymptotic confidence intervals for linear combinations 
of transition probabilities as well as of probabilities in independent sequences of multi- 
nomial trials analogous to that used in the analysis of variance is suggested. 


10. A Central Limit Theorem for Systems of Regressions. E. J. Hannan, 
University of North Carolina. (Introduced by David B. Duncan) 


The theory of regression on fixed variables, when the residuals are generated by a sta- 
tionary process, has been illuminated by the introduction of certain restrictions on the 





534 ABSTRACTS 


regressor vectors by Grenander. It is the purpose of the paper to show that, for a reasonably 
wide class of stationary residuals, these conditions a: sufficient to ensure that the esti- 
mates of the regression coefficients are asymptotically normal. The case of a multiple sys- 
tem of regressions is considered. 


11. Power Functions for the Test of Independence in 2 X 2 Contingency Tables. 
Wituiam Harkness, Pennsylvania State University. 


A unified treatment for testing for independence in 2 X 2 tables is given. Using the uni- 
formly most powerful test for independence in each of the three 2 X 2 tables, as determined 
by the number of restrictions on the marginal totals, a comparison of the exact power func- 
tion for each test is made. Using an asymptotic normal approximation to the distribution of 
two independent binomials, conditional on fixed sum, asymptotic power is examined. The 
adequacy of the non-central chi-square approximation to power for small sample sizes 
(n = 10, 20, and 30) is considered, with exact values of power having been calculated. The 
availability of these exact values makes it possible to evaluate the adequacy of other 
approximations, particularly Patnaik’s [Biometrika, Vol. 35, pp. 157-175] and Sillitto’s 
[Biometrika, Vol. 36, pp. 347-352] approximations to the power for the test of equality of two 
binomial parameters. The normal approximation theorem shows Patnaik’s results are based 
on erroneous considerations. The asymptotic results are similar to those of Mitra [Ann. 
Math. Stat., Vol. 29, pp. 1121-1234]. 


12. The Partition of Phenotypic Variance Based on the Genic-environmental 
Interaction Model. Cecit L. Kauier and Vireo, L. ANpDEeRson, Purdue 
University. 


The genic-environmental interaction model in population genetics is developed by a 
direct extension of Kempthorne’s (1954) theoretical gene model to include environmental 
factors. Both the genetic and the environmental factors are considered as the main factors 
in a factorical experimental design. This permits the separation of all orders of interaction 
terms. By employing algebraic identities and extensive algebraic manipulations, a complete 
phenotypic model is developed which is the sum of terms to account for all environmental 
and all genetic main effects influencing phenotypic expression as well as terms for all pos- 
sible orders of genic-environmental interactions. Each effect is accounted for by a different 
term in the model, and all terms are shown to be uncorrelated. Hence for a population 
described by this model, the total phenotypic variance, o’» , may be readily partitioned into 
a sum of components due to the various effects. Any changes made in the assumptions or the 
existence of interactions in the genic-environmental interaction model are reflected imme- 
diately in the variance partition by merely adding or dropping components of the sum. 


13. A Robust Approximate Confidence Interval for Components of Variance. 
Howarp Levene, Columbia University. 


Let zi, =ptyte2y,i=l,--: J;7 =1,--- , J, with B(y;) = 2(2;) = 0, Var(y:) = 
o, , Var(z:;) = o*. The classical F test for testing Hy: ¢, = 0 is exact for normality of the 
y and z, and is robust. Previously suggested confidence intervals for ¢} are approximate, 
and strongly affected by non-normality. (See e.g. Scheffé, The Analysis of Variance.) I give 
a robust method for testing equality of variances in Contributions to Probability and Sta- 
tistics: Essays in Honor of Harold Hotelling. In the same spirit let V; = I(x; — z..)*? (J — 1)-! 
— 3; (xij — 2.)? (J — 1)~. Then E(V,) = o} , the V; have a common variance and they 
have a positive correlation of order 7~*. An ordinary Student's ¢ test may be used on the 





ABSTRACTS 535 


V, to test H,: E(V,) = 0, = a; for any a; and hence to obtain confidence intervals. However 
for a, = 0 the F test should be used. If J 2 10, and probably for even smaller values, the V 
test is generally satisfactory, while for very small J any confidence interval is unsatisfac- 
torily long. The above method can be extended to r-way classifications, and, less satisfac- 
torily, to unbalanced 1l-way classifications. 


14. An Inequality for Balanced Incomplete Block Designs. Wapire F. Mixa, 
University of North Carolina. (By title) 


Consider a Balanced Incomplete Block Design (B.1.B.D.) with parameters v, b, r, k, 
where 6 is the number of blocks, r is the number of replications of each treatment, and 
» is the number of times a pair of treatments occur together in a block. It was proved by 
Bose (Sankhya, Vol. 6, 1942) that if the B.1.B.D. is resolvable, then 6 = r + r — 1. The 
present paper shows that the condition of resolvability is not necessary and that the above 
inequality holds under the weaker condition » = nk where n is an integer greater than 1. 


15. Markov Renewal Processes of Zero Order. Ronatp Pyke, Columbia 
University. (Invited Paper) 


A Markov Renewal process (M.R.P.) determined by (m, A, Q), where m is the number 
of states, A is the 1 X m vector of initial probabilities and Q is the matrix of transition 
distributions Q;;(t), is said to be of zero order if for every i, Qi; (t) = Qie(t) for all j, & and 
it > 0. The general th eoly of M.R.P.’s simplifies considerably in this case, and the author is 
able to give more exp!’ icit results pertaining to first passage times, stationary distributions, 
and limit theorems of the number of visits to specified states. An example of a zero order 
M.R.P. which arises in counter theory is worked out in detail. 


16. On Centering Infinitely Divisible Processes. Ronatp Pyke, Columbia 
University. 


The concept of centering stochastic processes having independent increments, introduced 
by Lévy, is applied to processes having both stationary and independent increments (i.e. to 
Infinitely Divisible (1.D.) processes). The question of what centering functions preserve 
the stationarity of the increments is studied. It is shown that for an 1.D. process, there 
exists a unique centering function c satisfying c(s + 4) = c(s) + c(t) for all s,t 2 0 and 
c(1) = 0, such that the resulting centered process is also an I1.D. process. A proof of this 
result which does not use the Lévy-Khintchine representation of the characteristic function 
of an infinitely divisible random variable is given. 


17. The Asymptotic Power of the Kolmogorov Tests of Goodness of Fit. Dana 
Quvape, University of North Carolina. 


Let F,.(z) be the empirical distribution function of a random sample from some continuous 
distribution function G,(z). Then the (two-sided) Kolmogorov test of the hypothesis that 
G,(z) = H(z) rejects if sup, n? | F,(z) — H(z) | 2 Q, . Let Z,(t) = (n)*(F, iGO] -— 0 
and S,,(t) = (n)*(A{G,'(t)] — 0). Then the power of the test is 


P, = 1 — pr {sup, | Z,(t) — S.(t) | < Qn}. 


As n increases, and a is kept fixed, Q, approaches a limit Q, and Z,(t) becomes a certain 
Wiener process Z(t). Suppose that S,(f) also approaches a limit S(t). Then, extending 
Donsker’s justification of Doob’s “heuristic procedure’’, some sufficient conditions that 





536 ABSTRACTS 


lim,.,. P, = 1— pr {sup, | Z(t) — S(t) | < Q} are given. The one-sided test can be treated 
similarly. Upper and lower bounds on the asymptotic power of both tests against the class 
of all possible sequences {G,(z)} such that lim,., sup, | S,(¢) | = A, and various subclasses 
of this class, are exhibited. Finally, some numerical examples for the case where G,(z) 
consists in a translation of H(z) are provided: in particular, it is shown that for detecting 


shifts in the mean of a normal population the one-sided test has an asymptotic efficiency of 
roughly .6 to .7. 


18. Some Results on Error-Correcting Non-Binary Codes. D. K. Ray-Cuavup- 
HurRI, Case Institute of Technology. 


Consider a communication channel which can transmit p symbols where p is a prime 
number. A group code for such a channel with n places of which k are information places, is 
called an (n, k) p-ary code. A matrix M with elements in a field is said to possess the prop- 
erty (P,) if not rows of the matrix are dependent. If there isa (n X n — k) matrix M with 
elements in GF (p), the Galois field containing p elements, which possesses (P2,)-property, 
then there exists a ¢ error-correcting (n, k) p-ary code. Let n be the least integer such that 
for some integer c, cn + 1 = p™. Let r(j) denote the number of distinct residue classes mod 
n among the integers j, pj, p*j, --: p™"j, G +1), pG+1), PG +1) --: pve t'g +1), --- 
(j + 2t — 1), pG + 2t — 1), p*Gy + 2t — 1), --- p™ "GG + 2t — 1). Using the theorem on 
(P.,)-property, a t-error correcting (n, k) p-ary code is constructed with k = n — r(j). 
The result is extended to the case when p is a prime power. 


19. Concerning Achievement of the Lower Bound for the Power of the Kolmo- 
gorov-Smirnov Test of Fit. Jupan Rosensiatr, Purdue University. 


A lower bound for the power of the Kolmogorov-Smirnov test, as a function of distance 
from the null hypothesis, Fy , is easy to compute, using the fact that for fixed z, nF, (zr) 
has the Binomial distribution with parameters n and p = F(z). A natural question which 
arises is whether there is any distribution function F, with sup, | F(z) — Fo(z) | = | for 
which the computed lower bound for power is achieved. It is shown that if the asymptotic 
theory of these tests is conservative, then for those alternatives satisfying the condition 
ls .1 and for which the computed lower bound for power exceeds .95, there is a distribution 
function F which comes close to achieving this lower bound, when the asymptotic probabil- 
ity of Type I error does not exceed .05. 


20. On the Admissibility of a Class of Tests in Normal Multivariate Analysis. 
8S. N. Roy and W. F. Mrixaar, University of North Carolina. 


This paper proves the admissibility of (i) the largest root test, under a normal multi- 
variate linear model, for a linear hypothesis against the general linear alternative, (ii) the 
largest root test for independence between a p-set and a q-set of variates (having a (p + q)- 
variate normal distribution) against 2). ~ 0, where 2): is the covariance matrix between the 
p-set and the q-set, and (iii) the largest or smallest root test for the equality of two dis- 
persion matrices against certain types of one-sided alternatives. In each of the cases (i), 
(ii) or (iii), the test has an acceptance region which is the intersection of a class of regions, 
and the proof depends upon showing that if the acceptance region is to be proved inadmiss- 
ible by a rival, then that rival region must be contained in every member of the class of 
regions just mentioned, or, in other words, the rival itself must coincide with the acceptance 
region of the proposed test. Hotelling’s 7? is a very special case of (i) and the usual multiple 
correlation test is a special case of (ii). The same kind of proof, with some modifications, 
would go through for the corresponding \-criteria. 





ABSTRACTS 537 


21. On Dependent Tests in Analysis of Variance. 8. N. Roy and P. R. Krisu- 
NAIAH, University of North Carolina. 


Let F, = Si/S* fori = 1, , k be k F-statistics to test the null hypotheses H,, 
where Sj is the mean square due to H»,; and S? is the error mean square in ANOVA. Rama- 
chandran (these Annals, 1956) solved the distribution problems connected with the simul - 
taneous test of Ho: , He when St, , St, S? ate independently distributed. 
The present paper extends the above results to the situations where S; , 
not independently distributed. 


22. Lower Bounds on the Probability Associated with Certain Confidence Regions 
for the Multivariate Median. (Preliminary Report) Ernest M. Scuever, 
Space Technology Laboratories. (Invited paper) 


Consider a k-dimensional random variable (X,,--- , Xe) having unique median 
(m,-:: we). (DEF.: » = med X;.) Take a sample of size n of this random 
variable: (zu, , +++ , 2m), *** » (Zim, *** , Zen) and order the values z, --- , Zim to yield 
a(1) S z4(2) S++ S a(n), i = 1,--- , k. Select positive integers r; such ‘hat 27, < n 
(i = 1,--- , k) and form the set 


R = {(t1,°-+ , Ze): BC) CAC ay n—1 +1),t a l,---, kj. 


We ask (*) ‘“‘what is the probability 0 that R covers (» , --- , »%)?’’ The answer (fork > 1) 
depends on the joint distribution of (X, , --- , Xz), but sharp lower bounds over all distri - 
butions having unique medians have been obtained for ® (a) by Dunn (Ann. Math. Siat., 
Vol. 30 (1959), pp. 192-197) for the case k = 2 andr, = r, = r (say); and (b) by the present 
author in this paper for the cases 3 S k S 7 andr, = --- = ry = 1. The results under (b) 
can be summarized in the inequality P 2 1 — 2k(4)"+ 4 (8k — 7) (4)* — 16 (k — 3) (4)*. 
It is conjectured that this result is true for all k > 3. A result on the general problem (*) 
which, while not sharp, may prove to be quite satisfactory is given by the simple result 
Pz1— Sli fl — P lar) < % < a(n — rv + 1))). This formula is useful in that the 
terms in square brackets are readily obtainable from tables of the incomplete beta dis- 
tribution or of the cumulative binomial distribution. 


23. Asymptotic Shapes of Optimal Stopping Regions for Sequential Testing. 
Gipeon Scuwarz, Columbia University. (Introduced by T. W. Anderson) 


A hypothesis @ S a is to be tested sequentially against an alternative 6 2 b (a < 6) on 
the basis of independent observations on a random variable X whose distribution depends 
on a single parameter @ and is of the Koopman-Darmois type. If a < @ < 6 neither decision 
is penalized. A given a priori distribution of @ is assumed. The cost per observation is c. 
Theorem: If the optimal stopping region in the (n, >7 X;)-plane is transformed by divid- 
ing both coordinates by log (1/c), the transformed region approaches a finite limiting region 
as c — 0. An explicit formula for the limiting region (“asymptotic shape”) for arbitrary 
a priori distribution of @ is given. By transforming the asymptotic shape back to original 
scale, approximations to the optimal regions for small c are obtained. The theorem is proved 
by showing that the optimal boundary lies between two curves of constant Bayes risk, and 
finding the asymptotic shape of such curves. Finally the theorem is extended to two cases 
of two-parameter families. One is the case of testing the mean of a normal distribution with 
unknown bounded variance. In the other case Hy , H; and the indifference region consist of 
three arbitrary specified mutually dominating distributions. 





538 ABSTRACTS 


24. Invariant Bayes Rules. (Preliminary Report) Morris Sxrsinsxy, Purdue 
University. 


Given a sequence of independent random variables with a common distribution function 
known a priori to be one of K specified distribution functions, let $ be a class of rules for 
deciding which one of the K is correct. Let 3(g, 4) denote the set of all Bayes rules in $ 
relative to a priori probabilities g = (g; , gz , --: , ge), loss matrix \ = (A;;), and cost per 
observation unity. Let @ = {g: Digi = 1,9; >0,j = 1, --- , K}; Abe the space of K x K 
loss matrices, with zero diagonal and non-negative off-diagonal elements; M be a mapping 
from G@ into A. The class 3(M) of invariant Bayes rules relative to M, is defined to be 
f (3(g, M(g)) | ge G@). The class of invariant Bayes rules is then, 3 = Y (3(M) | all map- 
pings M). The importance of this class follows from the fact that if T ¢3(M), then T mini- 
mizes the expected number of observations required uniformly over the K hypotheses, 
among all rules whose error probabilities are bounded above by its own. Necessary and 
sufficient conditions for a rule to be an invariant Bayes rule are given. Several examples 
are considered for K = 3. For K = 2 (and § the class of all rules) it has been shown (See 
Wald and Wolfowitz, “Optimum Character of Sequential Probability Ratio Test’’, Ann. 
Math. Stat., Vol. 19 (1948)) that 3 is equivalent to the sequential probability ratio tests. 


25. On a Generalization of Balanced Incomplete Block Designs. J. N. Srivas- 
TAVA AND 8. N. Roy, University of North Carolina. 


A generalized BIBD (GBIBD) is defined as follows: Let the total number of treatments 
v = vo, + ve + +++ + v, be divided into S sets, the ith set containing v; treatments. Then 
the GBIBD is such that any treatment belonging to the ith set occurs (once only) in r; 
blocks, a pair from the ith set occurs in \; blocks, and a pair consisting of one treatment 
the ith set and another from the jth set occurs (once only) in exactly w;; blocks. The 
design may be arranged in equal or unequal block sizes. The motivation behind the 
use of such designs is that it permits treatment contrasts corresponding to the compar- 
isons within a set or for comparison of different sets among themselves, to be estimated 
with any given precision, provided the design with the corresponding values of \’s and 
p’s exist. Since with fixed d.f. for hypothesis and error, the power of a test depends on 
the noncentrality parameter only, these designs allow the total hypothesis Hy: t; = t = 

- = ¢, to be split up into subhypotheses corresponding to comparisons between or within 
sets, the subhypotheses being tested with given powers. The multivariate or multiresponse 
generalization has also been considered. In this paper, the analysis of this design (for the 
case of equal block sizes) has been presented both for fixed and mixed models. Several basic 
relations and inequalities among the parameters have been defined. Some studies on the 
structure of BIBD’s and GBIBD’s have been made. Several methods of construction of 
different kinds of GBIBD’s using known BIBD’s, factorial and other designs, have been 
presented. 


26. Maximum Likelihood Characterization of the Normal Distribution. Henry 
Teicuer, Purdue University. 


Let {F(z — 6)}, @ real, be a translation parameter family of absolutely continuous dis- 
tributions on the real line. If, for all (random) samples of size two and three, a maximum 
likelihood estimator of @ is the sample arithmetic mean, then F(z) is the normal distribu- 
tion. Analogous results are demonstrated in the case of a scale parameter family. 





ABSTRACTS 539 


27. On Two Methods of Unbiased Ratio and Regression Estimation. W. H. 
WituiamMs, McMaster University. (By title) 


R. M. Mickey (J. Amer. Stat, Assoc., Vol. 54, pp. 594-612) introduced a procedure for 
generating unbiased ratio and regression estimators. Another was used by Williams (Ann. 
Math. Stat., Vol. 29, p. 618). The two procedures have different appearances, but it is shown 
that a slight modification of Mickey’s method will lead to estimators which have the same 
form as those presented by Williams. The combinatorial equivalence is demonstrated also. 


28. On Linear Estimation of a Single Parameter of a Mean Function Under 
Second Order Disturbance. (Preliminary Report) N. DonaLp YLVISAKER, 
Columbia University. 


Let |P, ,\¢ AC R,{ be a family of probability measures on (Q, A). Let {Y(t), t¢ 7} bea 
family of random variables on (Q, A) satisfying EsY (t) = m(t;8), te 7, Es[Y(s) — m(e; B)) 
[Y(t) — m(t; 8)| = K(s,t), 8, teT, T anabstract set. The problem of linear estimation of 
the parameter @ is considered. Let H(K, 7) denote the reproducing kernel space of func- 
tions on T associated with K. Then the span of | Y (t), 4¢ 7} in Le(dP8), written V[Y (t),¢ 2 7' 
is operationally independent of 8 if m(-, 8) ¢«H(K, T) for all Be A (operational independence 
here means a sequence of random variables of the form {Z, cjn¥ (tj.)} isa Cauchy sequence 
in L.(dP;) for all Be A or no Be A). The precise lower bound 


Es|Z — ]* = 67/(1 + Iim(-, BI fecay) 


is obtained for Z ¢ Vs[¥ (t), t€ T| with the bound interpreted as zero if m(-, 8) ZH(K, 7). 
Bayes estimates of 8 are considered in special cases of the above model. 


20. A Generalization of a Theorem of Balakrishnan. N. DonaLp YLVISAKER, 


Columbia University. (By title) 


Let 7 be an abstract set and let K be a covariance kernel defined on 7’ X T. A function 
m defined on T is said to be an admissible mean function for the covariance kernel K if and 
only if there exists a family |X (¢), te T} of random variables on some probability space 
(a, A, P) with BLX(t)] = m(t), te T, E|X(s) X(t)) = K(s, t), 8, te T. Let H(K, T) denote 
the reproducing kernel space of functions on T associated with K. Theorem: m is an admiss 
ible mean function for the covariance kernel K if and only if me H(K, T) and || ml i.) $1 
This is a generalization of a result due to Balakrishnan (Ann. Math. Stat., September, 
1959) and provides an alternative proof of that result. 





NEWS AND NOTICES 


Readers are invited to submit to the Secretary of the Institute news items of interest 


Personal Items 


Arthur Albert received his Ph.D. from Stanford University in October, 1959. 
He is now on a National Science Foundation Post-Doctoral Fellowship at the 
Institute for Mathematical Statistics in Stockholm. 

Sidney Addieman is now an Associate in the Department of Statistics, lowa 
State University, Ames, Iowa. 

J. H. Bailey has accepted a position with IBM, Poughkeepsie, as Associate 
Statistician in Reliability Technology. 

Harold Baker was appointed to the Department of Statistics at Iowa State 
University for the year 1959-60. 

Donald W. Behnken has received his Ph.D. degree in Statistics from North 
Carolina State College. He is now with the Mathematical Analysis Section of 
the American Cyanamid Company at the Research Laboratory in Stanford, 

Jonnecticut. 

Austin J. Bonis has taken a position as a Senior Staff Engineer (Reliability) 
with AC, The Electronics Division of General Motors Corporation, Milwaukee, 
Wisconsin. 

Jean-Francois Canu, formerly a Mathematician in the Department of Defense, 
is now studying in preparation for the Catholic Priesthood. His address is St. 
Stephen’s Priory, Dover, Mass. 

Victor Chew, Assistant Statistician, Institute of Statistics, Raleigh, N. C. 
has taken up a position as Mathematical Statistician at the U.S. Naval Weapons 
Laboratory, Dahlgren, Virginia, effective February 1, 1960. 

Ira H. Cisin, formerly Adviser on Research Design at the Human Research 
Office of the George Washington University, Washington, D. C., has accepted 
tne post of Project Director of an Epidemiologic Study of Alcoholic Beverage 
Use, with the California Department of Public Health, Berkeley, California. 

C. Philip Cox (M.A. Oxford University), Head of the Statistics Section of the 
National Institute for Research in Dairying, Shinfield, England, is a visiting 
Associate Professor of Statistics at Iowa State University for the academic 
year, 1958-60. 

Constance E. Cox is lecturing for at least a year, beginning January 1, 1960, 
at the new academy set up in Djakarta, Indonesia to train statisticians and 
economists for government service. She is on leave from the Food and Drug 
Directorate, Biometrics Section, Ottawara, during this time. 

Phelps P. Crump, formerly at the Brookhaven National Laboratory, is now a 
member of the Biometrics Department at the School of Aviation Medicine, 
Brooks AFB, Texas. 

Earl L. Diamond has joined the faculty of the Johns Hopkins University 

540 








NEWS AND NOTICES 541 


School of Hygiene and Public Health, Baltimore 5, Maryland, as Assistant 
Professor of Public Health Administration (Chronic Diseases) and of Bio- 
statistics. He was formerly with the Department of Biostatistics of the Uni- 
versity of North Carolina School of Public Health. 

Ronald 8. Dick, formerly a Reliability Engineer with the Sperry-Rand 
Corporation of Great Neck, N. Y., is now working as a Reliability Engineer with 
the American Bosch Arma Corporation—ARMA Division, Garden City, N. Y. 
He still teaches night school at Queens College in the Mathematics Depart- 
ment. 

Melvin D. Fimple has accepted employment with Sandia Corporation, Al- 
buquerque, New Mexico, in the Statistics and Evaluation Section, Reliability 
Department. 

Wayne Fuller (B.8., M.S. U.) was appointed Assistant Professor of Statistics 
at Iowa State University, beginning September, 1959. 

Werner Gautschi, Professor of Mathematics at Ohio State University, died 
of a heart attack, October 3, 1959 in Columbus, Ohio. 

John E. Gessford is now employed by International Paper Company as a 
Trainee in the Treasury Department. 

R. Gnanadesikan left Proctor and Gamble and the University of Cincinnati 
on November 1, 1959 to join the Bell Telephone Laboratories as a member of 
the Technical Staff in the Mathematical Research Department. 

William A. Golomski, formerly Director of Operations Research, Oscar Mayer 
and Company, Madison, Wisconsin is now Vice-President of the H. J. Mayer 
and Sons Company, 6811 Ashland Avenue, Chicago 36, Illinois. 

I. J. Good, formerly at Government Communications Headquarters, has 
moved to the Admiralty Research Laboratory, Teddington, Middlesex. His 
private address is 58A Warren Road, Ashford, Middlesex, England. 

Leo A. Goodman, Professor of Statistics and Sociology at the University of 
Chicago, has been awarded a Senior Post-doctoral Fellowship by the National 
Science Foundation and a Fellowship by the John Simon Guggenheim Memorial 
Foundation for the 1959-60 academic year. He is now at the Statistical Lab- 
oratory of the University of Cambridge, Cambridge, England, on a leave of 
absence from the University of Chicago. 

William C. Guenther has accepted a position as Associate Professor of Statistics 
at the University of Wyoming. 

Donald Guthrie, Jr., is now a Mathematician for the Naval Warfare Research 
Center of the Stanford Research Institute, Menlo Park, California. He was 
formerly Assistant Professor of Mathematics and Mechanics at the U. 8. Naval 
Postgraduate School. 

Carl Hammer accepted the position of Administrator, Technical Projects 
Coordination, with the Radio Corporation of America, Defense Electronics 
Products. His office is located at 75 Varick Street, New York 13, N Y. 

Theodore W. Horner joined the Reliability and Statistics Section of Booz, 
Allen Applied Research, Corporation, 4921 Auburn Avenue, Bethesda, Mary- 








542 NEWS AND NOTICES 


land as a Senior Statistician in September, 1959. He was formerly Senior Opera- 
tions Research Analyst at General Mills, Minneapolis, Minnesotoa. 

H. Burke Horton, former Director, National Damage Assessment Center, 
and Director, Operations Research Office, O. C. D. M., Washington, D. C. has 
left government service and is now Assistant to the Vice President and General 
Manager, Remington-Rand UNIVAC Division, Sperry-Rand Corporation, 
New York City. Mr. Horton will continue in an advisory capacity as a member 
of the O. C. D. M. Program Advisory Committee. 

William G. Howard, formerly with the Operations Analysis Office at Hq. 
US Air Force, has joined the staff of the Operational Sciences Laboratory of 
the Research Triangle Institute in Durham, N. C. 

J. Stuart Hunter has accepted an appointment as a member of the Mathe- 
matics Research Center, U. S. Army, at the University of Wisconsin, Madison, 
Wisconsin. 

Rext Hurst (Ph.D., Cornell), Director of the Statistical Laboratory, Utah 
State University, is a Visiting Associate Professor of Statistics at Iowa State 
University for the year 1959-60. 

J. O. Irwin, who spent a year as Visiting Professor at the University of North 
Carolina, Chapel Hill, N. C., has now returned to England. His address is 
M.R.C. Statistical Research Unit, School of Hygiene, Kappel Street (Gower 
Street), London W. C. 1, England. 

Emil H. Jebe, formerly Associate Professor of Statistics, Department of 
Statistics and Statistical Laboratory, Iowa State University, Ames, Iowa, is 
now a Research Mathematician, Operations Research Department, Willow 
Run Laboratories, The University of Michigan, Box 2008, Ann Arbor, Michigan. 
He is engaged in the design and analysis of experiments for systems research and 
development: General statistical consultation in the OR Department and for 
Project Michigan. 

Charles Jordan, retired Professor of Mathematics at the University of Buda- 
pest, Hungary and Fellow of the Institute of Mathematical Statistics, died 
December 24, 1959. 

Shriniwas Katti was appointed to the Department of Statistics at Iowa State 
University for the year 1959-60. 

Nathan Keyfitz has left the Dominion Bureau of Statistics, Ottawa, Canada 
to take up a post as Professor of Political Economy at the University of Toronto, 
Canada. 

Melville R. Klauber is now a graduate student at Stanford University, Stan- 
ford, California in the Department of Statistics. 

Carl F. Kossack resigned as Chairman of the Department of Mathematics 
and Statistics at Purdue University, Lafayette, Indiana to accept a position as 
Manager of the Department of Statistics and Operations Research of IBM 
Research, Yorktown Heights, N. Y. 

Seott Krane has accepted a position as Associate in the Department of 
Statistics, lowa State University, Ames, Iowa. Krane is also the recipient of 





NEWS AND NOTICES 543 


the 1960 George W. Snedecor Award in Statistics. The award is given each year 
to the outstanding Ph.D. candidate and carries with it a cash award as well as 
a year’s membership in The Institute of Mathematical Statistics and a year’s 
subscription to The Annals of Mathematical Statistics. 

H. O. Lancaster was appointed Associate Professor in the Department of 
Preventive Medicine in the University of Sydney on March 2, 1959. He has 
since resigned from this position to accept the appointment to the newly created 
Chair of Mathematical Statistics in the University of Sydney, effective June, 
1959. 

Harold Larson is now an Associate in the Department of Statistics, lowa 
State University, Ames, Iowa. 

Erich L. Lehmann has returned to the Department of Statistics of the Uni- 
versity of California, Berkeley, California after a years leave of absence spent 
at the University of Zurich, Switzerland. 

8. P. H. Mandel (B.A., M.A. Sydney University, Sydney, Australia), was 
appointed Assistant Professor at Iowa State University beginning September, 
1959. 

Benoit Mandelbrot, of the University of Lille, France is with the International 
Business Machines Research Center, Yorktown Heights, N. Y. 

George F. T. Mayer died June 22, 1959. 

Alan J. Mayne is now with the Electronic Computing Laboratory of the 
University of Leeds, England were he has been appointed to a Research Fellow- 
ship. His duties there include statistical research. 

Norman Morse is now working in the Reliability Group of the General Electric 
Advanced Electronics Center at Ithaca, New York. 

Joseph M. Moser received his Ph.D. from St. Louis University, St. Louis, 
Missouri in June, 1959. He has been appointed Assistant Professor in the Mathe- 
matics Department at San Diego State College, San Diego, California for the 
academic year 1959-60. 

Joseph A. Navarro has left his position with General Electric Company to 
take a position with the Operations Research and Statistics Department, IBM 
Research at Yorktown Heights, N. Y. 

Jose Nieto was appointed to the Department of Statistics at lowa State Uni- 
versity for the year 1959-60. 

Wade A. Norton, formerly a Teaching Fellow in the Department of Mathe- 
matics at Auburn University, Alabama is now Instructor in Mathematics and 
Coach of B-Team Athletics at West End High School, Birmingham, Alabama. 

G. B. Oakland, formerly Chief of the Statistical Research and Services, 
Canada Department of Agriculture, has been apointed Senior Research Statisti- 
cian, the Dominion Bureau of Statistics, Ottawa, as of January 1, 1960. 

Kamini M. Patwary, formerly Assistant Professor in Mathematical Statistics 
at Howard University, Washington, D. C., and part-time instructor in Statistics 
at the University of Maryland, College Park, Maryland, has now joined the 
World Health Organization as a Statistician in the Division of Communicable 





544 NEWS AND NOTICES 


Diseases at the Palais des Nations, Geneva. He completed the requirements for 
the Ph.D. in Statistics in November, 1959 and the formal degree will be con- 
ferred at the June, 1960 Commencement. 

K. C. 8. Pillai has returned to the Statistical Office of the United Nations at 
New York after spending more than three years in the Philippines where he 
was United Nations Senior Adviser in Mathematical Statistics and Visiting 
Professor of Statistics at the Statistical Center, University of the Philippines, 
Manila. 

Paul L. Poston received the Doctorate of Business Administration from 
Harvard University in May, 1959. He has been appointed to the faculty of the 
School of Business Administration of Northeastern University, Boston, Mass. 
He was formerly Vice President-Actuary for the Great Lakes Mutual Life 
Insurance Company, Detroit, Michigan and a part-time faculty member of 
Wayne State University. 

D. V. Rajalakshman, Professor of Statistics, University of Madras, Madras, 
India has taken leave of absence for a year from October, 1959 to work at the 
Cowles Foundation, Yale University, with a grant from the Rockefeller Founda- 
tion. 

Wyman Richardson has joined the Operations Research Department of the 
Willow Run Laboratories, University of Michigan, Ann Arbor, Michigan. 

Willard C. Riss, Jr., Professor at Knox College, Galesburg, Illinois, died 
November 9, 1959. 

Gene Rove, former Director of Operations of the Electronics Division of 
Gruen Industries, has joined the Space Technology Laboratories, Inc. Van 
Nuys, California s a Senior Staff Engineer. 

Jack Sawyer is Lecturer and Postdoctoral Research Fellow in Quantative 
Methodology in the Department of Sociology at the University of Chicago. 

Herbert Scarf, Assistant Professor in the Statistics Department of Stanford 
University, Stanford, California is spending the 1959-60 academic year at the 
Cowles Foundation for Research in Economics at Yale. 

Perry Scheinok has left Indiana University and is now an Instructor in 
Mathematics at Wayne State University, Detroit, Michigan. 

K. C. Seal is now Senior Deputy Director, Labor Bureau, Kennedy House, 
Simla 4, India. 

Shayle R. Searle has recently returned to New Zealand after completing three 
years in the Animal Husbandry Department, Cornell University. He has re- 
turned to this position as Research Statistician in the Herd Improvement Section 
of the New Zealand Dairy Board, Box 866, Wellington, New Zealand. 

Norman C. Severo has accepted an appointment with the University of Buffalo 
as an Associate Professor of Mathematical Statistics. 

B. V. Shah was appointed to the Department of Statistics at Iowa State 
University for the year 1959-60. 

Richard H. Shaw is now employed as Staff Statistician with the IBM Research 
Center in Yorktown Heights, N. Y. 








NEWS AND NOTICES 545 


M. M. Siddiqui, formerly with the Institute of Statistics and the Social Re- 
search Center of the Punjab University, Lahore, Pakistan, is now with the 
Boulder Laboratory of the National Bureau of Standards, Boulder, Colorado. 

Frederick A. Sorensen received the Ph.D. in Mathematics at the Carnegie 
Institute of Techology in June, 1959. He is employed as a Statistician at the 
Applied Research Laboratory of United States Steel Corporation at Monroeville, 
Pa. The title of his thesis was “Operating Characteristics of the Control Chart 
for Sample Means as a Test for Model Alternatives.” 

H. C. Sweeney has left Atlantic Refining Company and is now acting as a 
consultant in statistics for industry. His new address is 1518 Sunny Hill Lane, 
Havertown, Pa. 

Alan E. Treloar is now Chief of a newly organized Statistics and Analysis 
Branch (DRG) of the National Institute of Health in Bethesda, Md. The func- 
tion of the branch is to establish and maintain all necessary statistical informa- 
tion systems concerning the research and training grants and fellowship and 
training awards programs of the Public Health Service administered by the 
National Institutes of Health. 

Donald R. Truax has been appointed to an Assistant Professorship at the 
University of Oregon, Eugene, Oregon. 

John W. Tukey of Bell Telephone Laboratories, Murray Hill, N. J. and 
President of the Institute of Mathematical Statistics has been appointed to 
the President’s Science Advisory Committee. 

Raymond J. Twery, formerly with Gardner Advertising, is now a Mathe- 
matical Statistician with the Research Office of the Combat Development 
Experimentation Center, Fort Ord, California. 

Clifford A. Wallace, Director of Quality Control for Eastman Kodak Com- 
pany, Operations and Apparatus Division, Rochester, N. Y., retired from the 
company on January 1, 1960 after 41 years of service. He and his wife are living 
at 267 Oxford Street, Rochester. 

Yin Huai Wang is now with RCA as Engineer and Statistician in the Electron 
Tube Division at their plant in Marion, Indiana. 

Harry Weingarten received his Ph.D. in Mathematical Statistics from George 
Washington University. His thesis title was ““The Law of Large Numbers and 
Related Theorems.” 

George F. Woodward has accepted the position of Manager-438 Program 
with the Information Systems Section, Defense Systems Department, General 
Electric Co., in Washington, D. C. His new address is 6411 Stratford Rd., 
Chevy Chase 15, Md. 

Peter W. Zehma received his Ph.D. in Mathematical Statistics from Stanford 
University in October, 1959. He is now Assistant Professor of Mathematics at 
Colorado State College in Greeley, Colorado. 

George Zyskind was appointed Assistant Professor of Statistics at Iowa State 
University beginning September 1959. 





NEWS AND NOTICES 
NEW MEMBERS 


The following persons have been elected to membership in the Institute 


Abdel-aty, Soliman Hasan, Ph.D. (University College, London University); Visiting 
Fellow, Princeton University, 1959-60, and Expert Statistician of The Central Ministry 
of Education, Cairo, U. A. R.; 20 Vandeventer Avenue, Princeton, N. J. 

Al-Doori, Younis A., M.A. (University of Oregon); Teaching Assistant, University of 
Washington, Seattle, Washington; 4073 Union Bay Circle, Seattle §, Wash. (Reinstated ) 

Ali, Mir. M., M.S. (University of Michigan); Teaching Fellow and Graduate Student; De- 
partment of Mathematics, University of Toronto, Toronto, Ontario, Canada. 

Asano, Chooichiro, M.S. (Mathematical Institute, Kyushu University); Quality Control 
Engineer, Shionogi and Company, Osaka City, Japan; 180, Mashilashimo, Mishima 
Machi, Osaka Prefecture, Japan. 

Baltz, Henry J., Vice President, Perpetual Building Association, 500 Eleventh Street, N. W., 
Washington 4, D.C. 

Bennett, Carl Melvin, B.S.E.E. (Alabama Polytechnic Institute); Graduate Assistant in 
Mathematics; Mathematics Department, Alabama Polytechnic Institute, Auburn, 
Alabama. 

Berk, Robert H., M.S. (Massachusetts Institute of Technology); Student, Department of 
Statistics, Harvard University, Cambridge, Mass. (38) 

Burgess, Peter Dewey, M.S. (N. C. State College); Research Staff Member, Radiation, 
Inc., Orlando, Florida; 2327 Middleton Ave., Winter Park, Fla. 

Carlyle, John W., M.S. (University of Washington); Associate in Electrical Engineering, 
University of California; 2441 Haste Street, Berkeley 4, Calif. 

Carroll, Charles L., Jr., Ph.D. (University of North Carolina); Manager of RCA Quality 
Analysis, Missile Test Project, RCA Service Company, Patrick Air Force Base, Florida; 
109 Michigan Avenue, Melbourne, Florida. 

Cassady, Mrs. Janet Pancoast (C. F.), A.B. (Florida State University); Research Associ- 
ate, Biometric Unit, New York State College of Agriculture; Warren Hall, Cornell 
University, Ithaca, N. Y. 

Chatterji, Srishit Dhar, M.S. (Lucknow University); Research Assistant; Department of 
Statistics, Michigan State University, East Lansing, Michigan. 

Chung, James H., Ph.D. (University of Toronto); Assistant Professor; Department of Mathe- 
matics, University of Toronto, Ontario, Canada. 

Fountain, J.G., B.S. (University of California); Reliability Engineer; Nortronics, Division 
of Nerthrop Corporation, 222 N. Prarie, Hawthorne, California. 

Goode, Jamie J., M.S. (Georgia Institute of Technology); Student, Department of Sta- 
tistics, University of North Carolina, Chapel Hill, N. C.; 111 Johnson Street, Chapel 
Hill, N.C. 

Granger, Clive William John, Ph.D. (University of Nottingham, England); Lecturer in 
Mathematics at University of Nottingham and Visiting Fellow, Princeton University, 
Princeton, N. J.; Econometrics Research Project, 92-A Nassau Street, Princeton, 
Nd. 

Halsted, Leonard R., M.A. (University of Michigan); Research Assistant, University of 
Michigan Research Institute, Ann Arbor, Michigan; 1613 East Stadium, Ann Arbor, 
Michigan. 

Johnson, Jerry Jon, B.S. (University of Illinois); Student, University of Illinois, Urbana, 
Illinois; 201 South Lincoln, Urbana, Illinois. 

Kuiper, Nicolaas Hendrik, Ph.D. (University of Leiden); Professor of Mathematics and 
Statistics, Landbouwhogeschool, Wageningen, Netherlands. 

Levy, Joel, M.A. (John Hopkins University); Mathematician, Bureau of Supplies and 
Accounts, U. 8. Navy, Washington, D. C.; 1526 17th Street, N. W., Washington, 6, D.C. 





NEWS AND NOTICES 547 


Mathot, Eugene F., B.S. (Massachusetts Institute of Technology); Mathematician, The 
Occidental Life Insurance Co. of California, Los Angeles, California; 588 East Avenue 
39, Los Angeles 31, California. 

McGregor, John Ross, Ph.D. (Cambridge University); Assistant Professor of Mathematics, 
University of Alberta, Edmonton, Alberta, Canada. 

McKeon, Alfred J., B.A. (George Washington University); Chief, Quality Control and 
Operations Research, EOD, Bureau of the Census, FOB 8, Suitland, Md. 

Mizuki, Mikiso, M.S. (Purdue University); Graduate Student, Harvard University; 67 
Dartmouth Street, Boston 16, Mass. 

Nagai, Takeaki, B.S. (Kyushu University); Student; Mathematical Institute, Faculty of 
Science, Kyushu University, Fukuoka, Japan. 

Naruishi, Joseph I., M.S. (University of Lilinois); Assistant Mathematician, Armour Re- 
search Foundation, 10 West 35th Street , Chicago, Illinois; 3140 South Michigan, Chicago, 
Illinois. 

Ogawara, Masami, Ph.D. (Kyushu University); Professor of Mathematical Statistics; 
Tokyo Woman's College, Iogi, Suginami, Tokyo, Japan. 

Poirier, B. W., Zeugnis (University of Vienna, Austria); Analytical Statistician; 4500 
Sizrth Street, South, Arlington 4, Va. 

Prins, Hendrik Johan, Research Laboratories, N. V. Philips Gloeilampen Fabrieken, 
Eindhoven, Netherlands; Allard Du Hamelstraat 83, Eindhoven, Netherlands. 

Pugh, Edward L., B.E.E. (University of Santa Clara); Associate Engineer, Ryan Aero- 
nautical Company, Lindberg Field, San Diego, California; Advanced Design Section, 
Ryan Aeronautical Company, Lindberg Field, San Diego, California. 

Sadasivan, Govindan G., M.S. (University of Trayancore); Assistant Professor of Sta 
tistics, Indian Agricultural Research Institute, Pusa, New Delhi, India; c/o Institute 
of Agricultural Research Statistics, Pusa, New Delhi, India. 

Sakanashi, Genichi, M 8. (University of Kyushu); Assistant Professor of Mathematics; 
Hyuga Gakuin Junior College, 110 Yamato-cho, Miyazaki-shi, Kyushu, Japan. 

Siotani, Minoru, Kogaku-si (University of Tokyo); Chief of Research Room 3, Second 
Division; Institute of Statistical Mathematics, 1 Azabu-fujimi-cho, Minato-ku, Tokyo, 
Japan. 

Slotsky, Alice W. (Mrs. Gordon J.), B.A. (Bryn Mawr College); Actuarial Trainee, Metro- 
politan Life Insurance Company, 1 Madison Avenue, New York, N. Y.; Apartment ? P 
Hampton House, 123-35 82nd Road, Kew Gardens, N.Y. 

Srivastava, Siyaram, M.Sc. (Lucknow University); Graduate Teaching Assistant; Sta- 
tistical Laboratory, Purdue University, Lafayette, Indiana. 

Stone, William Matthewson, Ph.D. (Iowa State College); Associate Professor, Oregon 
State College, Corvallis, Oregon and Research Specialist, Boeing Airplane Company, 
Seattle, Washington; 15609 Ambaum Road, Seattle, Waxhington. (66) 

Sumerlin, William T., E.E. (Stanford University); Manager, Reliability Administration, 
Government and Industrial Division, Phileo Corporation, 4700 Wissahickson Avenue, 
Philadelphia 44, Pa.; Bor 215, Radnor, Pennsylvania. 

Teitlebaum, Albert David, M.S. (McGill University); Lecturer in Mathematics, McGill 
University, Montreal, Canada; Department of Mathematics, McGill University, Montreal, 
Canada. 

Ung, L. Tom, B.C.E. (Rensselaer Polytechnic Institute); Senior Engineer, Esso Research 
and Engineering Company, Florham Park; P. O. Bor 209, Madison, New Jersey. 

Walbesser, William J., M.S. (Stevens Institute of Technology); Section Head; Cornell 
Aeronautical Laboratory, Bor 236, Buffalo 21, New York. 

Wetherill, G. Barrie, Ph.D. (London University); Lecturer in Statistics; Mathematics 
Department, Birbeck College, Malet Street, London W. C. 1, England. 

William, James S., M.S. (Agricultural and Mechanic College of Texas); Graduate As- 
sistant; Department of Experimental Statistics, N.C. State College, Raleigh, N.C. 





548 NEWS AND NOTICES 


Woodall, Rosalie C., M.S. (Florida State University); Mathematical Statistician, Diamond 
Ordnance Fuze Laboratories, Connecticut Avenue and Van Ness Street, N. W., Wash- 
ington 25, D. C.; 5815 Connecticut Avenue, N. W., Washington 15, D.C. 


AMS-IMS TRANSLATION PROGRAM 


The IMS in conjunction with the AMS is sponsoring a series of translations 
of articles in foreign languages, particularly in Russian and Chinese. The first 
volume of 500 pages is now in preparation. Suggestions for articles of current 
interest are chiefly desired, although suggestions for older articles will also be 
welcome. Suggestions should be sent to Russian Translation Project, American 
Mathematical Society, 190 Hope Street, Providence 6, Rhode Island. 


i 


SUMMER INSTITUTE FOR COLLEGE TEACHERS 


Professors T. A. Bancroft and Oscar Kempthorne of Iowa State University 
and Professors E. C. Bryant and Robert White, of the University of Wyoming 
taught for the two month duration of the 1959 Summer Institute for College 
Teachers of Statistics, sponsored by the National Science Foundation and 
presented jointly by the University of Wyoming and Iowa State University at 
Laramie. There were 64 participants representing colleges in 34 different states. 
Special weekly seminars were presented by Professors A. H. Bowker, W. E. 
Deming, Franklin Graybill, Morris Hansen, H. O. Hartley, D. V. Huntsberger, 
and H. O. Wold. 


TRAINEESHIPS FOR PUBLIC HEALTH STATISTICIANS 


The Public Health Service has announced the availability of traineeships for 
graduate training of professional public health personnel during the 1960-61 
academic year. 

Traineeships in public health statistics are available to qualified persons. 
They provide stipends from a minimum of $250 per month for a post-bachelor 
candidate to a maximum of $400 per month for a post-doctoral candidate and 
additional allowances for dependents, travel of the trainee, and academic tuition 
and fees. 

Additional information and application forms may be secured from the Division 
of General Health Services, Public Health Service, U. 8. Department of Health, 
Education, and Welfare, Washington 25, D. C. 





NEWS AND NOTICES 


RECOGNITION GIVEN LOYOLA LAB 


The Administration of Loyola University, Chicago, Illinois, has recently 
given official recognition to the Loyola Psychometric Laboratory. The Laboratory 
has been in operation for approximately two years, both for research and for 
the training of graduate students in psychometrics, and publishes limited editions 
of its research activities. 

The staff includes the following persons: H. J. A. Rimoldi (M.D., Ph.D.), 
Director; J. R. Devane (Ph.D.); Sister M. Canisia (Ph.D.); T. F. Grib (M. A.); 
and J. V. Haley (M. A.). 


2 rr 


DOCTORIAL DISSERTATIONS IN STATISTICS, 1959 


Listed below are doctorates conferred during the year 1959 in the United 
States for which the dissertations were written on topics in statistics or related 
fields. The university, major subject, and the title of the dissertation are given 
in each case. Readers are invited to notify the Editor of any omissions from the 
list. 


James H. Abbott, University of Illinois, major in mathematics, ‘“Topics in Information 
Theory.” 

Arthur Albert, Stanford University, major in statistics, ““The Sequential Design of Ex- 
periments for Infinitely Many States of Nature.’’ 

David W. Alling, Cornell University, major in statistics, ‘“The Representation of Follow-up 
Experience in Chronic Disease a Probability Process.’’ 

Abdur R. Ansari, Virginia Polytechnic Institute, major in statistics, ‘Two-Way Rank- 
Sum Tests for Variances.”’ 

Richard E. Beckwith, Purdue University, major in statistics, ‘Analytic and Computation 
Aspects of Dynamic Programming Processes of High Dimension.’’ 

Donald W. Behnken, North Carolina State College, major in experimental statistics, 
‘‘Simplex-sum designs—a class of second order rotatable designs derivable from those 
of first order.’’ 

Vasant Prabhakar Bhapkar, University of North Carolina, major in statisties, ‘‘Con- 
tributions to the Statistical Analysis of Experiments with One or More Responses 
(Not Necessarily Normal.)’’ 

Austin J. Bonis, The George Washington University, major in statistics, ‘Solution to 
Problems in Solomon Kullback’s Information Theory and Statistics.”’ 

Billy J. Boyer, Purdue University, major in mathematics, ‘Summation of Trignometrics 
Series by a Generalization of the Cesaro Method.” 

Leroy S. Brenna, Virginia Polytechnic Institute, major in statistics, ‘‘Factorial Treatments 
in Lattice Designs.”’ 

Byron William Brown, Jr., University of Minnesota, major in biostatistics, “Some Proper- 
ties of the Spearman Estimator in Bioassay.” 

Whitfield Cobb, University of North Carolina, major in statistics, “Studies in Uni- 
variate and Multivariate Variance Components Analysis Connected with Sampling 
from a Finite Population.” 

William Dollard Commins, Jr.. Stanford University, major in statistics, ‘Asymptotic 
Variance as an Approximation to Expected Loss for Maximum Likelihood Estimates.”’ 





550 NEWS AND NOTICES 


Herbert R. Domke, Harvard University, major in biostatistics, ‘‘Social Class and the 
Childhood Diseases.”’ 

Richard E. Dowd, Purdue University, major in mathematics, ‘‘Vector and Operator 
Valued Radon Measures and Distributions.”’ 

Raymond J. Dry, Harvard University, major in education, ‘‘An Application of a Theory 
of Pure Factor Tests to the Construction of Homogeneous Tests.” 

Willard L. Eastman, Harvard University, major in economies, ‘‘Linear Programming With 
Pattern Constraints.” 

Roger H. Farrell, University of Illinois, major in statistics, ‘‘Sequentially Determined 
Bounded Length Confidence Intervals.’ 

Charles F. Federspiel, North Carolina State College, major in experimental statistics, 
‘“Simplex-sum Designs—a Class of Second Order Rotatable Designs Derivable From 
Those of First Order.”’ 

Antranig Gafarian, University of California, major in mathematical statistics, “On Con- 
fidence in Polynomial Regression.’’ Los Angeles. 

William A. Glenn, Virginia Polytechnic Institute, major in statistics, ‘‘“Some Aspects of 
Paired-Comparison Experiments.”’ 

Charles J. Grayson, Jr., Harvard University, major in business, ‘‘Decisions Under Un- 
certainty: Drilling Decisions by Independent Oi) and Gas Operators.” 

Samuel W. Greenhouse, The George Washington University, major in statistics, ‘In 
formation Theory and the Statistical Problem of Discrimination.” 

Joseph Arthur Greenwood, Harvard University, major in statistics, “‘Distribution Theory 
of Some Angular Variates.”’ 

David Lee Hanson, Indiana University, major in probability and statistics ““Contributions 
to Decision Theory, Ergodic Theory, and Stochastic Processes.”’ 

Madhav Pandurang Heble, Indiana University, major in statistics and probability, ‘‘Linear 
Estimation of Regression Coefficients; Orthogonal Matrix Polynomials & Application 
to Multidimensional Weakly Stationary Processes; Interpolation & Regression.’ 

Syed A. Husain, Purdue University, major in mathematics, ‘“Conveyance Factors and 
Summability of Orthogonal Expansions.”’ 

Donald G. Johnson, Purdue University, major in mathematics, ‘‘A Structure Theory for 
a Class Lattice Ordered Rings.”’ 

Richard M. Karp, Harvard University, major in applied mathematics, “‘Some Applications 
of Logical Syntax to Digital Computor Programming.”’ 

Morris W. Katz, University of Illinois, major in statistics, ‘‘Admissible and Minimax 
Estimates of Parameters in Truncated Spaces.” 

Thomas R. Knapp, Harvard University, major in education, ‘“Two-group Classification in 
the Absence of a Criterion.”’ 

Frank Bardsley Knight, Princeton University, major in probability, ‘“‘Construction of 
Diffusion Processes.” 

John J. Korbel, Harvard University, major in economics, ‘‘A Decision Unit Mode! for the 
Labor Force.”’ 

Huan Pao Kuang, University of Minnesota, major in statistics, “The Theory of the Metric 
Function and its Application.” 

James Albert Lechner, Princeton University, major in statistics, “Optimum Decision for 
the Comparison of Two Poisson Processes Based On Minimization.” 

John E. Mack, Purdue University, major in mathematics, ‘“The Order Dual of the Space of 
Radon Measures.” 

Frank G. Martin, Jr., North Carolina State College, major in experimental statistics, 
‘‘An Empirical Study of the Effect of Linkage on Progress from Selection.”’ 

Robert H. McDowell, Purdue University, major in mathematies, ‘“‘Extensions of Continuous 
Mapping From Dense Spaces.”’ 





NEWS AND NOTICES 551 


Benjamin H. McLemore, Jr., University of Illinois, major in statistics, “On Test Effi- 
ciency Against Normal Alternatives.”’ 

Manavazhi V. Menon, Ohio State University, major in statistics, ‘‘Interblock and Inter- 
block Estimates.” 

Edward Olaf Nelson, University of Minnesota, major in mathematics, ‘“‘A Solution of the 
Generalized Heat Flow Equation in a Bounded Region as a Wiener Integral.’’ 

Togo Nishiura, Purdue University, major in mathematics, “Anaiytic Theory of Continuous 
Transformations.”’ 

James A. Norton, Jr., Purdue University, major in statistics, ‘Tests of Hypotheses in the 
Case of Unequal Variances.” 

Bernard S. Pasternack, North Carolina State College, major in experimental statistics, 
‘Reversal Functions of a Test Procedure and Their Associated Bounds when Data are 
Incomplete.”’ 

Richard Frederick Potthoff, University of North Carolina, major in statistics, “Multi- 
Dimensional Incomplete Block Designs.”’ 

Frank Proschan, Stanford University, major in statistics, ‘‘Polya-Type Distribution in 
Renewal Theory, with an Application to an Inventory Problem.”’ 

Malempati Madhusudana Rao, University of Minnesota, major in statistics, ‘‘Properties 
of Maximum Likelihood Estimators in Non-stable Stochastic Difference Equations.” 

Dwijendra Kumar Ray-Chaudhuri. University of North Carolina, major in statistics, 
“On the Application of the Geometry of Quadrics to the Construction of Partially 
Balanced Incomplete Block Designs and Error Correcting Binary Codes.”’ 

Donald Leonard Richter. University of North Carolina, major in statistics, ‘“Two-Stage 
Experiments for Estimating a Common Mean.” 

Donald Merle Roberts, Stanford University, major in statistics, ‘Approximations to 
Optimal Policies in a Dynamic Inventory Model.” 

Perry Scheinok, Indiana University, major in probability and statistics, ‘“The Error on 


Using the Asymptotic Variance and Bias of Spectrograph Estimates for Finite Observa- 
tion Time.” 


Richard H. Shaw, Purdue University, major in statistics, ‘‘Multivariate Classification with 
Normal Alternatives.’’ 

Robert Silverman, Ohio State University, major in mathematics, ‘‘A Metrization for Power 
Sets with Applications to Combinatorial Analysis.”’ 

Frederick A. Sorensen, Carnegie Institute of Technology, major in mathematics, ‘‘Operat- 
ing Characteristics of the Control Chart as a Test for Model I Alternatives.’ 

Thomas H. Starks, Virginia Polytechnic Institute, major in statistics, ‘“Tests of Signifi- 
eance for Experiments Involving Paired Comparisons.” 

William E. Thompson, Purdue University, major in mathematics, ‘‘Asymptotic Behavior 
and Stability for Systems of Nonlinear Differential Equations.” 

Malcolm E. Turner, North Carolina State College, major in experimental statistics, ‘“The 
Single Process Law: A Study in Nonlinear Regression.”’ 

Dale Elthon Varberg, University of Minnesota, major in mathematics, ‘‘Some Radon- 
Nikodym Derivatives Associated with Stochastic Processes.’ 

William T. Wells, North Carolina State College, major in experimental statistics, ‘‘Some 
Contributions to the Solution of Some Statistical Problems in the Exterior Ballistics 
of Spin-stabilized Rockets.”’ 

Harry Weingarten, The George Washington University, major in statistics, ‘“The Law of 
Large Numbers and Related Theorems.”’ 

David Anson Woodward, University of Minnesota, major in mathematics, “A Linear 
Transformation of Double Wiener Integrals.’’ 

Peter Zehna, Stanford University, major in statistics, “Optimal Inventory Depletion. 

Neal Zierler, Harvard Universit», major in mathematics, ‘On General Measure Theory.” 





552 NEWS AND NOTICES 


William J. Zimmer, Purdue University, major in statistics, “Sampling Inspection with a 
Given Discriminating Power.” 


rr 


REPORT OF THE PRESIDENT FOR 1959 


The Institute is now a going concern which rolls along on its own momentum. 
The year 1959 was a normal one in its progress. 

The most pressing problem which confronted the Institute was the ever 
mounting deficit of the Annals. A number of measures have been adopted to 
deal with this. The most important of these was the institution of page charges 
for publication, authorized by the Council and membership of the Institute at 
their meetings in Washington in December. Under this system the author’s 
institution will be billed for the publication of his papers. The decision as to 
whether a paper will be accepted or rejected for publication will be made solely 
on its merits, without reference to whether payment will be forthcoming or not, 
by an authority completely divorced from the financial process. Institutions will 
not be dunned for payment, nor will individual authors not connected with 
institutions be compelled to pay. It is expected that this system will work in 
practice so that, as usual, the financial burden will be borne by the wealthier 
American universities and certain agencies. 

The burden of the Institute’s work is borne by the members of its committees 
and editorial board. Most members of the Institute are very busy, but some 
willingly respond to a request to adopt additional burdens in the service of the 
Institute. I wish cordially to thank all who responded to my requests to serve 
on committees. The help and advice of the Secretary lightened considerably the 
chores of the presidency. 


J. WoLrowitTz 


———— 


IMS OFFICERS, COMMITTEES, AND REPRESENTATIVES—1959 


Council Members and Officers 


Terms Expire 1960 Terms Expire 1961 Terms Expire 1962 
David Blackwell F. J. Anscombe T. W. Anderson 
Harold Hotelling T. E. Harris J. L. Hodges 
Jerzy Neyman Leo Katz Z. W. Birnbaum 
I. R. Savage 8. S. Wilks W. Hoeffding 


President: John W. Tukey Secretary: G. E. Nicholson, Jr. 
President-Elect: E. L. Lehmann Treasurer: A. H. Bowker 
Editor: W. H. Kruskal 
Program Coordizztor: Dorothy Morrow Gilford 
Associate Secretaries: Central: Jack Silber 
Eastern: Joan Rosenblatt 
Western: Gerald J. Lieberman 





NEWS AND NOTICES 


Chairmen of Committees 


IMS COMMITTEE ON EXCHANGES: P.S. Dwyer 

IMS COMMITTEE ON FELLOWS: Z. W. Birnbaum 

IMS FINANCE COMMITTEE: Howard Levene 

IMS MEMBERSHIP COMMITTEE: Benjamin Epstein 

IMS COMMITTEE ON INSTITUTIONAL MEMBERS: Mervir Muller 

IMS COMMITTEE ON PROFESSIONAL STANDARDS: Joseph Levy 

IMS PROGRAM COMMITTEE FOR 1959 ANNUAL MEETING: Emanuel 
Parzen 

IMS PROGRAM COMMITTEE FOR 1959 EASTERN REGIONAL: Boyd 
Harshbarger 

IMS PROGRAM COMMITTEE FOR 1959 CENTRAL REGIONAL: Frank- 
lin Graybill 

IMS COMMITTEE FOR SPECIAL INVITED PAPERS: K. L. Chung 

IMS SUBSCRIPTIONS COMMITTEE: Edward Coleman 

AD HOC COMMITTEE TO INVESTIGATE THE POSSIBILITY OF 
BILLING FOR PUBLICATION IN THE ANNALS: A. M. Mood 

IMS COMMITTEE ON MATHEMATICAL TABLFS: D. B. Owen 

IMS COMMITTEE ON RUSSIAN TRANSLATIONS: Ingram Olkin 

IMS REPRESENTATIVE TO AAAS: Harold Hotelling 

AMERICAN STANDARDS ASSOCIATION COMMITTEE ON STATIS- 
TICAL NOMENCLATURE IMS REPRESENTATIVE: P. G. Hoel 

IMS REPRESENTATIVE IN DIVISION OF MATHEMATICS NATIONAL 
RESEARCH COUNCIL: W. Allen Wallis 

REPRESENTATIVES TO CONFERENCE ORGANIZATION OF THE 
MATHEMATICAL SCIENCES: W. Murray Rosenblatt, Z. W. Birnbaum 

IMS COMMITTEE TO ORGANIZE 1960 ANNUAL MEETING: E. W. 
Barankin 

REPRESENTATIVE TO ORGANIZING COMMITTEE OF 4TH. BERKE- 
LEY SYMPOSIUM: Herbert Robbins 

IMS COMMITTEE FOR 25TH. ANNIVERSARY: Boyd Harshbarger 

IMS 1960 NOMINATING COMMITTEE: Samuel Karlin 


REPORT OF THE LAFAYETTE, INDIANA MEETING OF THE 
INSTITUTE OF MATHEMATICAL STATISTICS 


The 1960 Central Regional Meeting, eighty-third meeting of the Institute of 
Mathematical Statistics was held at Purdue University, Lafayette, Indiana on 
April 7-9, 1960. The meetings were held jointly with the Biometric Section and 
the Section for Physical and Engineering Sciences of the American Statistical 
Association. Ninety-Eight people registered for the meetings. Registration and 
all technical sessions were held at the Purdue Memorial Center on the Purdue 
Campus. The appointed officiers for the meeting were Program Coordinator, 





554 NEWS AND NOTICES 


D. M. Gilford, Office of Naval Research; Associate Secretary, J. Silber, Roosevelt 
University; Assistant Secretary, H. E. McKeon, Purdue University; Program 
Chairman, F. J. Anscombe, University of Chicago. 

The program was as follows: 


THURSDAY, APRIL 7, 1960 


9:15-9:30 a.m.—Welcome to the Societies 


Chairman: Irvine W. Burr, Purdue University. Welcoming Remarks, Wiut1am L. Ayres, 
Dean of the School of Science, Education and Humanities, Purdue University. 


9:30-10:50 a.m.—-Probability 


Chairman: Donatp M. Roserts, University of Illinois. 
1. Characteristic Functions and Factor Closed Families, Henry Teicnuer, Purdue Uni- 
versity. 
2. Applications of Characteristic Functions in Statistics, R. G. Lana, Catholic University 
of America. 


11:00-12:30 p.m.—-Bioassay (BA/ASA and IMS) 


Chairman: F. M. Hemputiy, National Inst. of Health. 
1. The Spearman Estimator for Serial Dilution Assays, Evcene A. JOHNSON AND Byron 
W. Brown, University of Minnesota. 
2. Some Techniques for Establishing an Assay Standard in a Single Laboratory, James A. 
Norton, Purdue University. 
Discussants: Paut Merer, University of Chicago, anp HeLten Bozivicn, Purdue Uni- 
versity. 


1:00-2:20 p.m.—Computers 


Chairman: Viratt L. ANpERsSON, Purdue University. 

1. The Use of Computers in Statistical Education and Research, Raymond J. NELSON AND 
Frep C. Leone, Case Institute of Technology. 

2. The Computer in the Construction of Mutually Orthogonal Latin Squares, R. C. Bose, 
Case Institute of Technology and University of North Carolina, 1. M. CHakRravartt, 
University of North Carolina, ann D. E. Knurn, Case Institute of Technology. 

Discussants: Joun W. HamsBien, University of Kentucky, anp M. E. Terry, Bell Tele- 
Phone Laboratories. 


2:30-4:20 p.m.—Mixed Topics 


Chairman: Georce Resnixorr, Illinois Institute of Technology and University of Chicago. 
1. Implications of Objectivism, Raymonp Wricuton, University of Minnesota and Uni- 
versity of Birmingham. 
2. The Effect of Grouping on Estimation, Peter Frank, Syracuse University. 
3. On the Problem of Construction of Orthogonal Latin Squares of Side 4n + 2, Estuer 
Serpen, Northwestern University. 


4:30-5:30 p.m.—Special Address 


Chairman: Jowan H. B. Kemperman, Purdue University. 
Boundary Theory in Random Walk, J. L. Doos, University of Illinois. 





NEWS AND NOTICES 


FRIDAY, APRIL 8, 1960 
8:30-10:20 a.m.—Order Statistics 


Chairman: F. J. ANscompe, University of Chicago. 

1. Efficiencies of Estimators of Scale and Location Parameters Constructed from Order Sta- 
tistics of Censored Samples, Joun A. Tiscnenporr, Bell Telephone Laboratories, 
Allentown, Pa. 

2. A Central Limit Theorem for One-Sample Non-parametric Procedures, Z. GOVINTARA- 
suLu, University of Minnesota. 

3. Order Statistics of Partial Sums, J.G. Wenve., University of Michigan. 


10:30-12:20 a.m.—Theoretical Statistics 


Chairman: Pau. Meter, University of Chicago. 
1. The Relation between Sufficiency and Invariance, I: Theory, Donan L. BurKHOLDER, 
University of Illinois. 
2. The Relation between Sufficiency and Invariance, II: Applications, W. Jackson Haut, 
University of North Carolina. 
3. Optimum Property and Inadmissibility of Sequential Tests, Rosert A. Wissman, 
University of Illinois. 


12:45-2:20 p.m.—Truncation (SPES/ASA AND IMS) 


Chairman: Harry Smitu, Procter & Gamble Co. 
1. An Example of Estimation from Doubly Truncated Samples, Ricuarp V. Lave, Bell 
Telephone Laboratories, Murray Hill, N. J. 
2. Some Tests on Parameters of the Exponential Failure Law, Satya Dusty, Michigan 
State University. 
3. The Application of the Truncated Normal in Process Control and Inventory Study, Harry 
Smiru, aNp DonaLp Grace, Procter and Gamble Company. 


2:30-3:50 p.m.—Contributed Papers 


Chairman: Jack Sitper, Roosevelt University. 
1. An Extension of a Theoretical Gene Model to Provide for Genic-Environmental Inter- 
action Terms, Ceci, L. KALLER AND Viroi. L. ANpenson, Purdue University. 
2. On Some Extensions of Sampling wit. Probability Proportional to Size, D. K. Rav- 
Cuaupuurt, Case Institute of Technology. 
3. Note on Significance of Differences for Attribuies, Invine W. Burr, Purdve University. 
. Test for Regression Coefficients when Errors are Correlated, M. M. Sivpiqui, National 
Bureau of Standards. 
5. On the Analysis of Split-Plot Experiments, H. Leon Harter, Wright Air Development 
Center. 
}. Comparison of Estimators for some Generalized Poisson Distributions, 8. K. Katt, 
Florida State University, anp J. G. Gurus Np, Iowa State University. 
. Power of Some Two-Sample Distributution Free Tests, B. V. Suxnatme, Michigan 
State University. (By title) 
. Some New Single Level Continuous Sampling Plans, Joun 8. Wuire, Minneapolis- 
Honeywell Regulator Company. (By title) 


4:00-5:30 p.m.— Probability 


Chairman: Lovuts J. Core, Syracuse University. 
1. Remarks on the Coding Theorem, Parricx BILLinesLey, University of Chicago. 
2. Probability in the Tail of a Distribution, Metvin Katz, University of Chicago. 





556 NEWS AND NOTICES 


3. New Methods for Studying First Passage Probabilities, Jouan H. B. Kemperman, 
Purdue University. 


SATURDAY, APRIL 9, 1960 
9:00-10:20 a.m.—Production 


Chairman: Haruitey E. McKean, Purdue University. 
1. A Continuous Production Model, I. Ricuarp Savace, University of Minnesota. 


2. Rectifying Inspection of Lots, F. J. ANscompe, Princeton University and University 
of Chicago. 


Discussant: Irvine W. Burr, Purdue University. 


10:30-12:30 p.m.—Contributed Papers 


Chairman: Harutey E. McKean, Purdue University. 
1. Generalization of Thompson's Distribution I1I, Anpr& G. Laurent, Wayne State 
University. 
2. An Expansion for the Quadrivariate Normal Integral for a Stationary Markov Process, 
J. A. McFappen, Purdue University. 
3. Existence of Wald’s Sequential Test in the General Case, Ropert A. WissMAN, University 
of Illinois. 

. A Characterization of the Uniform Distribution in Compact Topological Groups, James 
H. Srapieton, Michigan State University. 

. Application to Stochastic Processes of a Uniqueness Property of the Rectangular Dis 
tribution, Herman Rvusin, Michigan State University. 

. A New Class of Sequential Decision Rules for Symmetric Problems, W. Jackson HaAuu, 
University of North Carolina. (By title) 

. An Expansion for the Quadrivariate Normal Integral when pir = pu = px = 0, J. A. 
McFappen, Purdue University. (By title) 

. Nonparametric Tests for Location and Scale Parameters in a Mixed Model with Discrete 
and Continuous Variables, SHasuikata B. Suxnatme, Michigan State University. 
(By title) 

9. Some Results in the Analysis of Variance, Setic Starr, George Washington University. 
(By title) 

. Normal Approximation to the Distribution of Two Independent Binomials, Conditional 
on Fized Sum, J. HANNAN, Michigan State University, anp W. Harkness, Pennsyl- 
vania State University. (By title) 

. A Characterization of Some Location and Scale Parameter Families, Supnisu G. Guurye, 
Northwestern University. (By title) 

. On Evaluation of Negative Binomial Distribution Function, G. P. Patti, University of 
Michigan. (By title) 


rr 


PUBLICATIONS RECEIVED 


Anuario Estadistico de Espana, Vol. 34, Presidencia del Gobierno, Instituto Nacional de 
Estadistica, Ferraz, 41, Madrid, Spain, 1959, 1099 pp. 

Food and Agricultural Organization of the United Nations, Yearbook of Fishery Swutistics, 
Vol. 9, New York, Columbia University Press, 1958, $4.50. 

Kapur, J. N. and Saxena, H. C., Mathematical Statistics, Delhi India, 8. Chand and Co., 
Rs. 10/-, 400 pp. 

Proceedings of Symposia in Applied Mathematics, Vol. 9, Providence, R. I., American Mathe- 
matical Society, 1959, 195 pp. 





BIOMETRIKA 


Volume 47, Parts 1 and 2 Contents June 1960 


Memoirs: 


Barrier, M. 8., Gower, J. C. © Laman, F. H. A comparison of theoretical and empirical results for some 
stochastic population . Kewpa avip G. Birth-and-death processes, and the 
Asnroxp, J. R., Surrn, C. 8. & Brown, Susawnwan. The quantal response 
Assays oD the same subject. Waker, A. M. Some consequences of superim 
am, P. D. Deterministic customer impatience in the queueing system a > 
& Fix, Everrn. Soe sercnen ss oastans numbers. Sarvastava, A 
J camulan caatiabsie aaa eae ee L . 
Kastensavum, Marvin A. The separation of molecular com countercurrent dialysis: a stochastic 
process. Saw, 1G. A note on the eror afters number of term of the vid-Johnson series for the ex 
values of normal order statistics. Watson, G. 8. More significance tests on the sphere. Jonnson, N. L. An 
approximation to the multinomial distribution: some properties and applications. Vagno.kar, M. K. & 
eTuenitt, G. B. The most economical binomial sequential probability ratio test. McGreeonr, J. R. An 
approximate test for serial correlation in polynomial 3 +e , tay Nomograms for fitting 
logistic function by maximum likelihood. Haront, Franx ‘A. & Breven, Mecvi~ AtLex. The Borel- 
i urn, E. J. The distribution of Kendall's score 8 for a pair of tied ran 
* tributions y D. H. Buare, C. - oo &C.L. Cunme, C. Burrows, % “ CLUNine- 
papenernrs, P. D. Fuvon, R. G. Lana & E. Luxace, P. i. Leauin, H. D. Parrenson, 
B. R. Rao, GI .. Tor, C. A. Wirixrs, M. E. Wise. 


Reviews: Other Books Recewed: 
The subscription, payable in advance, is 54/- (or $8.00) per volume (including postage). Cheques should be 


made payable to Biometrika, crossed “A/c Biometrika Trust’ and sen} to the Secretary, Biometrika Office, 
University College, London, W.C.1. All fot_ign cheques must be drawn on a Bank having a London agency. 


Issued by THE BIOMETRIKA OFFICE, University College, London 





ESTADISTICA 


Journal of the Inter American Statistical Institute 
Volume XVII, No. 64 Contents December 1959 


Sobre los Dos Aspectos Diferentes del Método Representativo: El Método de Mues- 
treo Estratificado y el Método de Seleccién Dirigida (traduccién) 


Jerzy NEYMAN 
El Muestreo y la Situacién Actual de la Estadistica Francisco Azorin Pocn 
Purposive and Stratified Random Sampling a Prepexick F. Sternan 
Estadisticas y Pardmetros : Can.os E. Dieuceraitr 
La Théorie de l’Estimation et les Sondages..... P. Tutoner 
Some Stratified Sampling Plans in Replicated Designs W. Epwarps Demino 
E] Problema de la Estratificacién Optima (traduccién) Tore DaLentus 
El Muestreo Estratificado Multiparamétrico (traduccién)... Tore Daventvs 
Recent Advances in Sampling Techniques Harry G. Romie 
Levantamento por Amostragem da Safra de Trigo de 1958 no Rio Grande do Sul 

Tuomas Janine, AMARo pa Costa Monterro y Rusens Jonce pe Campo 

Legal Provisions. Institute Affairs. Statistical News. Publications. 

Published quarterly Annual subscription price $3.00 (U. S.) 

INTER AMERICAN STATISTICAL INSTITUTE 


Pan American Union 
Washington 6, D.C. 





ECONOMETRICA 


Journal of the Econometric Society 
Contents of Vol. 28, No. 3 - July 1960 


James Duseneerry, Orro Ecxstern, anp Gary Fromm: Stability and Instability in the American 
Economy 

A. H. Lanp anp A. G. Dore: An Automatic Method of Solving Discrete Programming Problems 

Racpu E. Gomory anp Wiau1am J. Baumow: Integer Programming and Pricing 

L am Savace anv Kart W. Devurcn: A Statistical Model for the Gross Analysis of Transaction 


A. L. Na@ar: A Monte Carlo Study of Alternative Simultaneous Equation Estimators 
Greeory C. Cuow: Tests of Equality between Sets of Coefficients in Two Linear Regressions 
Lions: W. McKenzie: A Contribution to the Theory of Competitive Equilibrium 


w. t Canpver: A Short-Cut Method for the Complete Solution of Game Theory and Feed-Mix Prob- 
ms 


Caypee & Howe: An Alternative Proof of the Existence of General Equilibrium in a von Neumann 


Kennets J. Arrow anp Leonip Horwicz: Some Remarks on the Equilibria of Economic Systems 
Report or tse AMsteRDAM Meetinae 

Report or tue Wasninoton Meetine 

Boox Reviews 

ANNOUNCEMENTS AND Nores 





SANKHYA 
The Indian Journal of Statistics 


Edited by P. C. Mahalanobis 
Vol. 22, Parts | & 2, 1960 


Classification of natural and plantation teak (tectona grandis) grown at different localities of India and 

Burma with respect to ite physical and mechanical properties..........K. R. Nair and H. K. Mukherji 
A brief history of the organisation of official statistics in India during the British period 8. Subramanian 
Calculation of sampling errors for index numbers bande K. 8. Banerjee 
Type-study on peak period in harvesting Aman paddy, West oad December 1954-January 1955 

R. K. Som, G. C. Bhattacharyya and N. K. Namboodiri 

Next steps in planning P. C. Mahalanobis 
Industrialization of underdeveloped countries—a means to peace P. C. Mahalanobis 
Food resources and population of India—A historical study P. C. Bansil 
Test scoring by the 602A calculating punch Ajit Haldar 
Report: National sample survey number thirteen: Report of household transport operations 
Corrigenda 


ANNuAL Susscription : 30 rupees ($10.00), 10 rupees ($3.50) per issue. 
Back Numpers: 45 rupees ($15.00) per volume; 12/8 rupees ($4.50) per issue. 
Subscriptions and orders for back numbers should be sent to 


STATISTICAL PUBLISHING SOCIETY 
204/1 Barrackpore Trunk Road Caleutta 35, India 








METRON 


International Review of Statistics 


Editor: Corrapo G1n1, Rome (Italy) 

Editorial Committee: E. P. Briierer, Fribourg (Switzerland); F. P. Canrexu, 
Rome (Italy); C. Dacum, Cordoba (Argentine); M. P. Gerrenrt, Frankfurt a. M. 
(Germany); B. Git, Jerusalem (Israel); L. Savacs, Chicago (U.8.A.); H. von 
ScHELLING, Schenectady, New York (U.S.A.}; P. Sreriotis, Athens (Greece); 
P. Waitt e, Wellington (New Zealand); 8. Wiixs, Princeton (U.S.A.); H. Worn, 
Stockholm (Sweden); R. Ytceutua, Ankara (Turkey). 


Vol. XX - NN, 1-2, 3-4 Contents 1960 
O. M. J. Mrrrmann, Eine Methode zur Aufindung ursachlicher Zusammenhange mittele particler Korrela- 


tionskoe 

C. Daeum, Teoria de la Transvariacién. Sus applicaciones a la Economia 

V. Casrectano, Sull’insieme delle distribuzioni doppie ¢ triple rieultanti dall’associazione una ad una 
delle unita di due o tre distribuziont semplici aventi lo stesso numero di unita 

D. P. Banerstg, On the forms of some invariants of probability distribution 

Caro Beneverti, Su alewne singolari question statistiche ed econometriche connesse con le medic 

Corrapvo Gin1, Introduction to Statistics 

Bibliografia 


Subscription price (per volume): or the equivalent in other currencies. Articles and 
reviews may be written in English, Italian, Freneh, German or Spanish. eee — 
receive free of charge 25 copies e their publications. "Manuscripts submitted for 
as well as subscriptions, should be addressed to Prof. Corrado Gini, Universita di nly, 
Via Se di Diocleziano, 10. The Editors will not be responsible for the safe return of 
the origi 








ROYAL STATISTICAL SOCIETY 


Series B ( Methodological) 


30s per part Vol. 22, No. 1, 1960 Ann. Sub. 
Post Free £32s 
U.S. $9 
CONTENTS 


On selecting the largest of k normal population means. By C. W. Dunwert (With Discussion). 

Confidence regions in nonlinear estimation. By E. M. L. Beave (with Discuasion). 

The busy period in relation to the single-server queveing system with general independent arrivals and Erlan- 
gian service times. By B. W. Cowouty. 

A note on prediction from an autoregressive process using fiducial probability. By A. D. Ror. 

Some results for the queue with Poisson arrivals. Br N. U. Paasnuv. 

A queueing problem in which the arrival times of the customers are scheduled. By A. Meacen. 

The two-pack matching problems. By A. W. Josern anp M. T. L. Buey. 

On the theory of classical regression and double sampling estimation. By B. D. Tixxrwat. 

Estimation of parameters in time-series regression models. By J. Dene. 

Maximum-likelihood estimation procedures and associated tests of significance. By J. Arrcnison anv 8. D 
Sirvey. 

Regression analysis where there is prior information about supplementary variables. By D. R. Cox. 

Tables of the integral of the binomia! distribution function over an offset circle. By J. R. Lowe. 

On the distribution of the weighted difference of two independent Student variables. By H. Rupen. 


The Royal Statistical Society, 21, Bentinck Street, London, W. 1 





Announcing a New Journal 


SOVIET MATHEMATICS 


DOKLADY 


A Translation of all the Pure Mathematics Sections 
of Doklady Akademii Nauk SSSR 


The total number of pages of the Russian journal to be translated 
in 1960 will be about 1500. The first issue of this new publication 
of the American Mathematical Society will contain translations of the 
January and February issues of Doklady. 


Six Issues a year 


Domestic Subscriptions... . 
Foreign Subscriptions. . 
Single Issues 


Send Orders to 
American Mathematical Society 


190 Hope Street 
Providence 6, Rhode Island 





TECHNOMETRICS 


A Journal of Statistics 
for the Physical, Chemical and Engineering Sciences 
Vol. 2, No. 1 and 2, Contents, February /May 1960 


Some Remarks on Wild Observations W.H. Kruskel; Statistical Estimation of the Gasoline Octane Number 
Requirement of New Model Automobiles C. 8S. Brinegar and R. R. Miller; The Effect of Soquentie Batching 
for Acceptance- Rejection Sampling Upon Sample Assurance of Total Product iality Halperin and 
G. L. Burrows; Elements of the Theory of Extreme Values B. Epstein; System Efficiency and Reliabilivy 
R. EB. Barlow and L. C. Hunter; Aids for Fitting the Gamma Distribution by Maximum Likelihood J. A 
Greenwood and D. Durand; Experimental Designs to Adjust for Time Trends Hubert M. Hill; Tests for the 
Validity of the Assumption that the Underlying Distribution of Life is Exponential, Part 1 B. Epstein; 
Programming Fisher's t Method of Comparing Two Percen W. H. Robertson; Misclassi Data 
from a Binominal Population A. Clifford Cohen, Jr.; Rejection of Outliers F.J. Anscombe; Lecsting Out- 
liers in Factorial Experiments C. Daniel; Discussion of the Papers of Messrs. Anscombe and Daniel W. H. 
Kruskal, T. S. Ferguson, J. W. Tukey and E. J. Gumbel; Testa for the Validity of the Assumptions that the 
Underlying Distribution of Life is pone, Part Il B. Epstein; The Partial Duplication of Response 
Surface Designs O. Dykstra; A Rank Sum Test for Comparing all Pairs of Treatments 2. G. D. Steel; The 
Percentile Points of Distributions ane Known Cumulants R. A. Fisher and BE. A. Cornish; An Approxi- 
mation to the Negative Moments of the Positive Binomial Useful in Life Testing W. Mendenhall and L. H 
jebeann Order Statistics from the Gamma Distribution 8S. 8S. Gupta; Parallel Fractional Replicates C 
niel. 


Technometrics is published quarterly in February, May, August, and November. The annual non-member 
subscription rate is $8.00. To members of the American Statistical Association and the American Society for 
Quality Control the rate is $6.00. Checks should be made payable to Technometrics and addressed to Tech- 
nometrics, Post Office Box 587, Benjamin Franklin Station, Washington 6, D. C. 





JOURNAL OF 
THE AMERICAN STATISTICAL ASSOCIATION 
Volume 55 June, 1960 Number 290 


Bechhofer, Robert: A Multiplicative Model for Analyzing Variances which are Affected by Several Factors. 

Cohen, A. Clifford, Jr.: Extimation in the Truncated Poisson Distribution when Zeros and Some Ones Are 
Missing. 

Hays, William L.: A Note on Average Tau as a Measure of Concordance. 

Hogg, Robert V.: Certain Uncorrelated Statistics. 

Maieal, Sherman J.: Changes in the Rate and Components of Household Formation. 

Lord, Fredric M.: Large-Sample Covariance Analysis When the Control Variable is Fallibie. 

Muth, John F.: Optimal Properties of Exponentially Weighted Forecas‘s 

Quandt, Richard E.: Tests of the Hypothesis that a Linear Regression Syster) Obeys Two Separate Regimes. 

Rider, Paul R.: Variance of the Median of Samples From a Cauchy Distribution. 

Spiegelgias, Stephen: A Statistical Investigation of the Industrialization Controversy. 


BOOK REVIEWS 


AMERICAN STATISTICAL ASSOCIATION 
1737 K Street, N.W. Washington 6, D.C. 





TRABAJOS DE ESTADISTICA 


Review published by ‘‘Instituto de Investigaciones Estadisticas” of the ‘Consejo 
Superior de Investagaciones Cientfficas.’’ Madrid, Spain. 


Vol. X CONTENTS Cuaderno II 


P. Zoroa El teorema de la probabilidad producto 
R. Fortrer Observations discretes periodiques. 
Ep. Frawcxx La loi forte des grands nombres des variables uniformement bornees. 
G. Tixwtwer The application of decision theory of probability to a simple inventory problem. 
NOTAS 


V. Castro San Mantin 
Sobre posibilidad de empleo de la artilleria para la apertura de brechas en los campos terrestres de minas. 


F. Pepron: Limiti alle possibilitA di nisurazione della produttivita 


CRONICAS BIBLIOGRAFIA CUESTIONES Y EJERCICIOS 


For everything in connection with works, exchanges and subscription write to Professor Sixto R 
de Investigaciones Estadisticas, Consejo Superior de Investigaciones Cientificas (Serrano, 
Spain. The Review is composed of three fascicles published three times a year (about 350 pages), and 
price is 100 pesetas for Spain and South America and $4.00 U 8.A. for all other countries 














THE INSTITUTE OF MATHEMATICAL STATISTICS 


(Organised September 12, 1935) 
OFFICERS 
John W. Tukey, 
mail to Box eee ee =. ai Telephone Labors: 
a Marray Hill N on emia hand 
President-Elect: 


cy, Calor Department of Statistics, University of California, Berk- 


G, B. Nicholwon, 5:5 Department of Battie, Uivenity of North Cr 


Treasurer: 
A. H. Bowker, of U ’ 
Department of Statistics, Stanford University, Stanford, 
Editor: 
William of Eckhart of 
SS Statistics, Hall, University 
The purpose of the Institute of Mathematical Statistics is to encourage the 


a nantes 
MATICAL ATISTICS are $1000 por Jee adem Uf dhe United Bictea 
or Canada and $5.00 per year for r meee are special 


sae tee ities ae for some other classes of members. 
membership in the Institute should be senz to the Secretary of the 





. 
. 


The First-Passage Moments and the Invariant Measure of a Markov 
Joun Lampusti 


Chain. 515 
Note on the Distribution of Locally Maxima] Elements in a Random 
ManrsmaL, 


Contributions to the Theory of Rank Order Statistics joe 
Rules for Probebiliies of Rank Orders. Ricuarp Savace 519 


im 


MEETINGS OF THE INSTITUTE 
TENTATIVE SCHEDULE 





